在 c++ 中生成泊松变量
我实现了这个函数来生成泊松随机变量,
typedef long unsigned int luint;
luint poisson(luint lambda) {
double L = exp(-double(lambda));
luint k = 0;
double p = 1;
do {
k++;
p *= mrand.rand();
} while( p > L);
return (k-1);
}
其中 mrand 是 MersenneTwister 随机数生成器。我发现,当我增加 lambda 时,预期分布将会出错,均值在 750 左右饱和。这是由于数值近似还是我犯了任何错误?
I implemented this function to generate a poisson random variable
typedef long unsigned int luint;
luint poisson(luint lambda) {
double L = exp(-double(lambda));
luint k = 0;
double p = 1;
do {
k++;
p *= mrand.rand();
} while( p > L);
return (k-1);
}
where mrand is the MersenneTwister random number generator. I find that, as I increase lambda, the expected distribution is going to be wrong, with a mean that saturates at around 750. Is it due to numerical approximations or did I make any mistakes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您选择“现有库”路线,您的编译器可能已经支持 C++11 std::random 包。以下是您如何使用它:
我在上面使用了两种方式:
我试图模仿您现有的界面。
如果您创建一个带有均值的 std::poisson_distribution,那么反复使用该分布来获得相同的均值(如 main() 中所做的那样)会更有效。
这是我的输出示例:
If you go the "existing library" route, your compiler may already support the C++11 std::random package. Here is how you use it:
I've used it two ways above:
I tried to imitate your existing interface.
If you create a std::poisson_distribution with a mean, it is more efficient to use that distribution over and over for the same mean (as done in main()).
Here is sample output for me:
exp(-750) 是一个非常小的数字,非常接近最小可能的双精度数,因此您的问题是数值问题。无论如何,您的复杂性在 lambda 中都是线性的,因此该算法对于高 lambda 来说不是很有效。除非您有充分的理由自己编写此代码,否则使用现有的库实现可能是有意义的,因为这些数值算法往往对于您遇到的精度问题非常敏感。
exp(-750) is a very small number, very close to the smallest possible double, so your issue is numerical. In any case, your complexity will be linear in lambda, so the algorithm isn't very efficient for high lambda. Unless you have a great reason to code this yourself, using an existing library implementation probably makes sense, as these numerical algorithms tend to be touchy precisely for the precision issues you're encountering.
由于您仅在表达式
(p>L)
中使用L
,因此您实际上是在测试(log(p) > -lambda)
。这不是一个很有帮助的转变。当然,您不再需要 exp(-750),但您只会溢出p
。现在,p 就是 Π(mrand.rand()),log(p) 就是 log(Π(mrand.rand())) 就是 Σ(log(mrand.rand())。这为您提供了必要的转换:
double
只有 11 位指数,但有 52 位尾数,因此,这是数值稳定性的巨大提高,代价是您需要一个log。
在每次迭代中,而不是前面的单个exp
。Since you only use
L
in the expression(p>L)
, you're essentially testing for(log(p) > -lambda)
. That's not a very helpful transformation. Sure, you don't need exp(-750) anymore, but you'll just overflowp
instead.Now,
p
is just Π(mrand.rand()), and log(p) is log(Π(mrand.rand())) is Σ(log(mrand.rand()). That gives you the necessary transformation:double
has only 11 bits of exponent, but a 52 bits mantissa. Therefore this is a massive increase in numerical stability. The price paid is that you need alog
on every iteration, instead of a singleexp
up front.来自 另一个问题我之前问过,似乎您也可以将
poisson(750)
近似为poisson(375) + poisson(375)
。From another question I asked earlier, it seems you could also approximate
poisson(750)
aspoisson(375) + poisson(375)
.在此类情况下,您无需多次调用随机数生成器。您所需要的只是一个累积概率表:
然后生成一个随机数
0 <= r < 1
,并取第一个整数X
,使得c[X]> r
。您可以通过二分搜索找到这个X
。为了生成这个表,我们需要单独的概率。
如果 lambda 很大,这会变得非常不准确,正如您所发现的。但我们可以在这里使用一个技巧:从(或接近)最大值开始,使用
k = Floor[lambda]
,并暂时假装p[k]
等于1
。然后计算p[i]
fori > k
使用递推关系并且对于
i
k
using这可确保最大的概率具有最大可能的精度。
现在只需使用
c[i+1] = c[i] + p[i+1]
计算c[i]
,直到c [i+1]
与c[i]
相同。然后你可以通过除以这个限制值c[i]
来标准化数组;或者您可以保留数组不变,并使用随机数0 <= r < c[i]
。请参阅:http://en.wikipedia.org/wiki/Inverse_transform_sampling
In situations like these, you don't need to invoke the random number generator more than once. All you need is a table of cumulative probabilities:
Then generate a random number
0 <= r < 1
, and take the first integerX
such thatc[X] > r
. You can find thisX
with a binary search.To generate this table, we need the individual probabilities
If
lambda
is large, this becomes wildly inaccurate, as you have found. But we can use a trick here: start at (or near) the largest value, withk = floor[lambda]
, and pretend for the moment thatp[k]
is equal to1
. Then calculatep[i]
fori > k
using the recurrence relationand for
i < k
usingThis ensures that the largest probabilities have the greatest possible precision.
Now just calculate
c[i]
usingc[i+1] = c[i] + p[i+1]
, up to the point wherec[i+1]
is the same asc[i]
. Then you can normalise the array by dividing by this limiting valuec[i]
; or you can leave the array as it is, and use a random number0 <= r < c[i]
.See: http://en.wikipedia.org/wiki/Inverse_transform_sampling