在 C/C++ 中生成遵循正态分布的随机数
如何在 C 或 C++ 中轻松生成遵循正态分布的随机数?
我不想使用Boost。
我知道高德纳详细讨论了这个问题,但我手头现在没有他的书。
How can I easily generate random numbers following a normal distribution in C or C++?
I don't want any use of Boost.
I know that Knuth talks about this at length but I don't have his books at hand right now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(18)
有很多方法可以从常规 RNG 生成高斯分布数字。
通常使用 Box-Muller 变换。它正确地生成具有正态分布的值。数学很简单。您生成两个(均匀)随机数,并通过对它们应用公式,您将得到两个正态分布的随机数。返回一个,并保存另一个以供下一次请求随机数时使用。
There are many methods to generate Gaussian-distributed numbers from a regular RNG.
The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.
C++11
C++11 提供
std::normal_distribution
,这就是我今天要走的路。C 或旧版 C++
以下是按复杂性升序排列的一些解决方案:
将 12 个从 0 到 1 的均匀随机数相加,然后减去 6。这将匹配正态变量的平均值和标准差。一个明显的缺点是范围仅限于 ±6 – 与真正的正态分布不同。
Box-Muller 变换。这是上面列出的,并且实现起来相对简单。但是,如果您需要非常精确的样本,请注意 Box-Muller 变换与一些统一生成器相结合会遇到称为 Neave Effect1 的异常现象。
为了获得最佳精度,我建议绘制制服并应用逆累积正态分布以获得正态分布变量。 这里是非常好的逆累积正态分布算法。
1. HR Neave,“关于使用乘法同余伪随机数生成器的 Box-Muller 变换”,应用统计学,22, 92-97, 1973
C++11
C++11 offers
std::normal_distribution
, which is the way I would go today.C or older C++
Here are some solutions in order of ascending complexity:
Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.
The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.
For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.
1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973
一种快速而简单的方法是将多个均匀分布的随机数相加并取平均值。请参阅中心极限定理,了解其工作原理的完整说明。
A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.
我创建了一个用于正态分布随机数生成基准的 C++ 开源项目。
比较了几种算法,包括
cpp11random
使用 C++11std::normal_distribution
和std::minstd_rand
(实际上是 clang 中的 Box-Muller 变换)。iMac 上单精度(
float
)版本的结果[email] ;protected] , clang 6.1, 64 位:为了正确性,程序验证平均值、标准样本的偏差、偏度和峰度。结果发现,对 4、8 或 16 个均匀数求和的 CLT 方法不像其他方法那样具有良好的峰度。
Ziggurat 算法比其他算法具有更好的性能。然而,它不适合 SIMD 并行,因为它需要查表和分支。具有 SSE2/AVX 指令集的 Box-Muller 比 Ziggurat 算法的非 SIMD 版本快得多(x1.79、x2.99)。
因此,我建议使用 Box-Muller 来构建具有 SIMD 指令集的架构,否则可能会使用 ziggurat。
PS 该基准测试使用最简单的 LCG PRNG 来生成均匀分布的随机数。因此对于某些应用程序来说它可能不够。但性能比较应该是公平的,因为所有实现都使用相同的 PRNG,因此基准测试主要测试转换的性能。
I created a C++ open source project for normally distributed random number generation benchmark.
It compares several algorithms, including
cpp11random
uses C++11std::normal_distribution
withstd::minstd_rand
(it is actually Box-Muller transform in clang).The results of single-precision (
float
) version on iMac [email protected] , clang 6.1, 64-bit:For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.
Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.
Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.
P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.
这是一个基于一些参考文献的 C++ 示例。这又快又脏,你最好不要重新发明和使用 boost 库。
您可以使用 QQ 图来检查结果并查看它与真实正态分布的近似程度(对样本 1..x 进行排名,将排名转换为 x 总数的比例,即有多少样本,获取 z 值并绘制一条向上的直线就是所需的结果)。
Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.
You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).
这就是在现代 C++ 编译器上生成示例的方式。
This is how you generate the samples on a modern C++ compiler.
使用
std::tr1::normal_distribution
。std::tr1 命名空间不是 boost 的一部分。它是包含 C++ 技术报告 1 中添加的库的命名空间,并且可在最新的 Microsoft 编译器和 gcc 中使用,独立于 boost。
Use
std::tr1::normal_distribution
.The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.
您可以使用 GSL。给出了一些完整示例来演示如何使用它。
You can use the GSL. Some complete examples are given to demonstrate how to use it.
看一下: http://www.cplusplus.com/reference/random/normal_distribution/< /a>.这是生成正态分布的最简单方法。
Have a look on: http://www.cplusplus.com/reference/random/normal_distribution/. It's the simplest way to produce normal distributions.
如果您使用的是 C++11,则可以使用
std::normal_distribution
:
还有许多其他分布可以用来转换随机数引擎的输出。
If you're using C++11, you can use
std::normal_distribution
:There are many other distributions you can use to transform the output of the random number engine.
我遵循 http://www.mathworks 中给出的 PDF 定义。 com/help/stats/normal-distribution.html 并提出了这个:
这可能不是最好的方法,但它非常简单。
I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:
It is maybe not the best approach, but it's quite simple.
comp.lang.c 常见问题解答列表分享了三种不同的方法来轻松生成具有高斯分布的随机数。
你可以看一下: http://c-faq.com/lib/gaussian.html
The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.
You may take a look of it: http://c-faq.com/lib/gaussian.html
Box-Muller 实施:
Box-Muller implementation:
存在多种用于逆累积正态分布的算法。量化金融中最受欢迎的方法在 http:// chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/
在我看来,除了来自 Wichura:它是机器精度、可靠且快速。高斯随机数生成很少出现瓶颈。
这里的最佳答案提倡 Box-Müller,您应该意识到它有已知的缺陷。我引用 https://www.sciencedirect.com/science/article/pii/S0895717710005935< /a>:
There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/
In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.
The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:
看看我发现了什么。
此库使用 Ziggurat 算法。
Take a look at what I found.
This library uses the Ziggurat algorithm.
1) 生成高斯随机数的图形直观方法是使用类似于蒙特卡罗方法的方法。您可以使用 C 中的伪随机数生成器在高斯曲线周围的框中生成一个随机点。您可以使用分布方程计算该点是否位于高斯分布内部或下方。如果该点位于高斯分布内,则您将获得高斯随机数作为该点的 x 值。
这种方法并不完美,因为从技术上讲,高斯曲线趋向无穷大,并且您无法创建在 x 维度上接近无穷大的盒子。但高斯曲线在 y 维度上接近 0 的速度非常快,所以我不会担心这一点。 C 中变量大小的限制可能更多地限制了您的准确性。
2)另一种方法是使用中心极限定理,该定理指出,当添加独立随机变量时,它们会形成正态分布。记住这个定理,您可以通过添加大量独立随机变量来近似高斯随机数。
这些方法并不是最实用的,但是当您不想使用预先存在的库时,这是可以预料到的。请记住,这个答案来自于很少或没有微积分或统计经验的人。
1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.
This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.
2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.
These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.
蒙特卡罗方法
最直观的方法是使用蒙特卡罗方法。取合适的范围-X、+X。 X 值越大,正态分布越准确,但收敛时间较长。
一个。选择 -X 到 X 之间的随机数 z。
b.保持概率为
N(z,mean,variance)
,其中N是高斯分布。否则丢弃并返回步骤 (a)。Monte Carlo method
The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge.
a. Choose a random number z between -X to X.
b. Keep with a probability of
N(z, mean, variance)
where N is the gaussian distribution. Drop otherwise and go back to step (a).计算机是确定性设备。计算中不存在随机性。
此外,CPU中的算术装置可以计算某些有限整数集(在有限域中执行计算)和有限实有理数集的和。并且还进行了按位运算。数学需要处理更大的集合,例如具有无限个点的 [0.0, 1.0]。
你可以用一些控制器监听计算机内部的一些电线,但它会有均匀的分布吗?我不知道。但是,如果假设它的信号是大量独立随机变量累加值的结果,那么您将收到近似正态分布的随机变量(这在概率论中得到了证明)
存在称为伪随机生成器的算法。我觉得伪随机生成器的目的是模拟随机性。而善良的标准是:
- 经验分布收敛(在某种意义上 - 逐点、均匀、L2)到理论分布
- 您从随机生成器收到的值似乎是独立的。当然,从“真实的角度”来看这不是真的,但我们假设这是真的。
一种流行的方法 - 你可以对均匀分布的 12 个 irv 求和......但是说实话,在傅里叶变换、泰勒级数的帮助下推导中心极限定理时,需要多次进行 n->+inf 假设。 例如理论 - 就我个人而言,我不明白人们如何执行均匀分布的 12 IRV 之和。
我在大学学过概率论。特别是对我来说,这只是一个数学问题。在大学里我看到了以下模型:
这样的方式如何做它只是一个例子,我想它还存在另一种实现方式。
证明它是正确的可以在这本书中找到
Krishchenko Alexander Petrovich ISBN 5-7038-2485-0
不幸的是,我不知道这本书是否存在英文翻译。
Computer is deterministic device. There is no randomness in calculation.
Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.
You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)
There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is:
- the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical
- values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.
One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.
I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:
Such way how todo it was just an example, I guess it exist another ways to implement it.
Provement that it is correct can be found in this book
"Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0
Unfortunately I don't know about existence of translation of this book into English.