如何有效地对伯努利随机变量之和进行建模?

发布于 2024-10-07 03:53:34 字数 516 浏览 9 评论 0原文

我正在使用 Perl 来建模随机变量 (Y),它是大约 15-40k 独立伯努利随机变量 (X_i) 的总和,每个变量都有不同的成功概率(p_i)。形式上,Y=Sum{X_i},其中 Pr(X_i=1)=p_iPr(X_i=0)=1-p_i

我有兴趣快速回答诸如 Pr(Y<=k)(其中给出 k)之类的查询。

目前,我使用随机模拟来回答此类问题。我根据 p_i 随机绘制每个 X_i,然后将所有 X_i 值相加得到 Y'。我重复这个过程几千次并返回时间分数 Pr(Y'<=k)

显然,这并不完全准确,尽管随着我使用的模拟数量的增加,准确性会大大提高。

你能想出一个合理的方法来获得准确的概率吗?

I am using Perl to model a random variable (Y) which is the sum of some ~15-40k independent Bernoulli random variables (X_i), each with a different success probability (p_i). Formally, Y=Sum{X_i} where Pr(X_i=1)=p_i and Pr(X_i=0)=1-p_i.

I am interested in quickly answering queries such as Pr(Y<=k) (where k is given).

Currently, I use random simulations to answer such queries. I randomly draw each X_i according to its p_i, then sum all X_i values to get Y'. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k).

Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.

Can you think of a reasonable way to get the exact probability?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

如梦 2024-10-14 03:53:34

首先,我会避免使用内置的 rand 来实现此目的,因为它太依赖于底层 C 库实现而不可靠(例如,请参阅我的 博客文章 指出 的范围Windows 上的 rand 基数为 32,768)。

要使用蒙特卡罗方法,我将从一个已知的良好随机生成器开始,例如 Rand ::MersenneTwister 或仅使用 Random.org 的服务之一并预先计算Y 的 CDF 假设 Y 非常稳定。如果每个Y只使用一次,那么预先计算CDF显然是没有意义的。

引用维基百科

在概率论和统计学中,泊松二项分布是独立伯努利试验总和的离散概率分布。

换句话说,它是一系列 n 个独立的是/否实验中成功次数的概率分布,成功概率为 p1, …, pn。 (强调我的)

泊松二项式概率密度函数的闭式表达式 可能会令人感兴趣。该文章位于付费墙后面:

我们讨论了它在计算速度和实现以及简化分析方面的一些优势,并举例说明了后者,包括矩的计算以及二项式系数和二项式累积分布函数 (cdf) 的新三角恒等式的开发.

First, I would avoid using the rand built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range of rand on Windows has cardinality 32,768).

To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for Y assuming Y is pretty stable. If each Y is only used once, pre-computing the CDF is obviously pointless.

To quote Wikipedia:

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials.

In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p1, …, pn. (emphasis mine)

Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:

and we discuss several of its advantages regarding computing speed and implementation and in simplifying analysis, with examples of the latter including the computation of moments and the development of new trigonometric identities for the binomial coefficient and the binomial cumulative distribution function (cdf).

迷乱花海 2024-10-14 03:53:34

据我记得,这不应该以正态分布渐近结束吗?另请参阅此新闻组主题:http://newsgroups .derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html

如果是这样,您可以使用 Statistics::Distrib::Normal

As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html

If so, you can use Statistics::Distrib::Normal.

温柔一刀 2024-10-14 03:53:34

要获得精确的解决方案,您可以利用以下事实:两个或多个独立随机变量之和的概率分布为其各自分布的卷积。 卷积有点昂贵,但必须仅当 p_i 发生变化时才计算。

一旦获得了概率分布,您就可以通过计算概率的累积和轻松获得 CDF。

To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the p_i change.

Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文