如何有效地对伯努利随机变量之和进行建模?
我正在使用 Perl 来建模随机变量 (Y
),它是大约 15-40k 独立伯努利随机变量 (X_i
) 的总和,每个变量都有不同的成功概率(p_i
)。形式上,Y=Sum{X_i}
,其中 Pr(X_i=1)=p_i
和 Pr(X_i=0)=1-p_i
。
我有兴趣快速回答诸如 Pr(Y<=k)
(其中给出 k
)之类的查询。
目前,我使用随机模拟来回答此类问题。我根据 p_i
随机绘制每个 X_i
,然后将所有 X_i
值相加得到 Y'
。我重复这个过程几千次并返回时间分数 Pr(Y'<=k)
。
显然,这并不完全准确,尽管随着我使用的模拟数量的增加,准确性会大大提高。
你能想出一个合理的方法来获得准确的概率吗?
I am using Perl to model a random variable (Y
) which is the sum of some ~15-40k independent Bernoulli random variables (X_i
), each with a different success probability (p_i
). Formally, Y=Sum{X_i}
where Pr(X_i=1)=p_i
and Pr(X_i=0)=1-p_i
.
I am interested in quickly answering queries such as Pr(Y<=k)
(where k
is given).
Currently, I use random simulations to answer such queries. I randomly draw each X_i
according to its p_i
, then sum all X_i
values to get Y'
. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k)
.
Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.
Can you think of a reasonable way to get the exact probability?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,我会避免使用内置的
rand
来实现此目的,因为它太依赖于底层 C 库实现而不可靠(例如,请参阅我的 博客文章 指出的范围Windows 上的 rand
基数为 32,768)。要使用蒙特卡罗方法,我将从一个已知的良好随机生成器开始,例如 Rand ::MersenneTwister 或仅使用 Random.org 的服务之一并预先计算
Y
的 CDF 假设Y
非常稳定。如果每个Y
只使用一次,那么预先计算CDF显然是没有意义的。引用维基百科:
泊松二项式概率密度函数的闭式表达式 可能会令人感兴趣。该文章位于付费墙后面:
First, I would avoid using the
rand
built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range ofrand
on Windows has cardinality 32,768).To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for
Y
assumingY
is pretty stable. If eachY
is only used once, pre-computing the CDF is obviously pointless.To quote Wikipedia:
Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:
据我记得,这不应该以正态分布渐近结束吗?另请参阅此新闻组主题:http://newsgroups .derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html
如果是这样,您可以使用 Statistics::Distrib::Normal。
As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html
If so, you can use Statistics::Distrib::Normal.
要获得精确的解决方案,您可以利用以下事实:两个或多个独立随机变量之和的概率分布为其各自分布的卷积。 卷积有点昂贵,但必须仅当
p_i
发生变化时才计算。一旦获得了概率分布,您就可以通过计算概率的累积和轻松获得 CDF。
To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the
p_i
change.Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.