在 C/C++ 中生成遵循正态分布的随机数

发布于 2024-08-22 12:55:25 字数 92 浏览 3 评论 0原文

如何在 C 或 C++ 中轻松生成遵循正态分布的随机数?

我不想使用Boost。

我知道高德纳详细讨论了这个问题,但我手头现在没有他的书。

How can I easily generate random numbers following a normal distribution in C or C++?

I don't want any use of Boost.

I know that Knuth talks about this at length but I don't have his books at hand right now.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

〗斷ホ乔殘χμё〖 2024-08-29 12:55:25

有很多方法可以从常规 RNG 生成高斯分布数字

通常使用 Box-Muller 变换。它正确地生成具有正态分布的值。数学很简单。您生成两个(均匀)随机数,并通过对它们应用公式,您将得到两个正态分布的随机数。返回一个,并保存另一个以供下一次请求随机数时使用。

There are many methods to generate Gaussian-distributed numbers from a regular RNG.

The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.

策马西风 2024-08-29 12:55:25

C++11

C++11 提供 std::normal_distribution,这就是我今天要走的路。

C 或旧版 C++

以下是按复杂性升序排列的一些解决方案:

  1. 将 12 个从 0 到 1 的均匀随机数相加,然后减去 6。这将匹配正态变量的平均值和标准差。一个明显的缺点是范围仅限于 ±6 – 与真正的正态分布不同。

  2. Box-Muller 变换。这是上面列出的,并且实现起来相对简单。但是,如果您需要非常精确的样本,请注意 Box-Muller 变换与一些统一生成器相结合会遇到称为 Neave Effect1 的异常现象。

  3. 为了获得最佳精度,我建议绘制制服并应用逆累积正态分布以获得正态分布变量。 这里是非常好的逆累积正态分布算法。

1. HR Neave,“关于使用乘法同余伪随机数生成器的 Box-Muller 变换”,应用统计学,22, 92-97, 1973

C++11

C++11 offers std::normal_distribution, which is the way I would go today.

C or older C++

Here are some solutions in order of ascending complexity:

  1. Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.

  2. The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.

  3. For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.

1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973

简美 2024-08-29 12:55:25

一种快速而简单的方法是将多个均匀分布的随机数相加并取平均值。请参阅中心极限定理,了解其工作原理的完整说明。

A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.

眼泪都笑了 2024-08-29 12:55:25

我创建了一个用于正态分布随机数生成基准的 C++ 开源项目

比较了几种算法,包括

  • 中心极限定理法
  • Box-Muller 变换
  • Marsaglia 极坐标法
  • Ziggurat 算法
  • 逆变换采样法。
  • cpp11random 使用 C++11 std::normal_distributionstd::minstd_rand (实际上是 clang 中的 Box-Muller 变换)。

iMac 上单精度(float)版本的结果[email] ;protected] , clang 6.1, 64 位:

normaldistf

为了正确性,程序验证平均值、标准样本的偏差、偏度和峰度。结果发现,对 4、8 或 16 个均匀数求和的 CLT 方法不像其他方法那样具有良好的峰度。

Ziggurat 算法比其他算法具有更好的性能。然而,它不适合 SIMD 并行,因为它需要查表和分支。具有 SSE2/AVX 指令集的 Box-Muller 比 Ziggurat 算法的非 SIMD 版本快得多(x1.79、x2.99)。

因此,我建议使用 Box-Muller 来构建具有 SIMD 指令集的架构,否则可能会使用 ziggurat。


PS 该基准测试使用最简单的 LCG PRNG 来生成均匀分布的随机数。因此对于某些应用程序来说它可能不够。但性能比较应该是公平的,因为所有实现都使用相同的 PRNG,因此基准测试主要测试转换的性能。

I created a C++ open source project for normally distributed random number generation benchmark.

It compares several algorithms, including

  • Central limit theorem method
  • Box-Muller transform
  • Marsaglia polar method
  • Ziggurat algorithm
  • Inverse transform sampling method.
  • cpp11random uses C++11 std::normal_distribution with std::minstd_rand (it is actually Box-Muller transform in clang).

The results of single-precision (float) version on iMac [email protected] , clang 6.1, 64-bit:

normaldistf

For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.

Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.

Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.


P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.

り繁华旳梦境 2024-08-29 12:55:25

这是一个基于一些参考文献的 C++ 示例。这又快又脏,你最好不要重新发明和使用 boost 库。

#include "math.h" // for RAND, and rand
double sampleNormal() {
    double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double r = u * u + v * v;
    if (r == 0 || r > 1) return sampleNormal();
    double c = sqrt(-2 * log(r) / r);
    return u * c;
}

您可以使用 QQ 图来检查结果并查看它与真实正态分布的近似程度(对样本 1..x 进行排名,将排名转换为 x 总数的比例,即有多少样本,获取 z 值并绘制一条向上的直线就是所需的结果)。

Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.

#include "math.h" // for RAND, and rand
double sampleNormal() {
    double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double r = u * u + v * v;
    if (r == 0 || r > 1) return sampleNormal();
    double c = sqrt(-2 * log(r) / r);
    return u * c;
}

You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).

悲歌长辞 2024-08-29 12:55:25

这就是在现代 C++ 编译器上生成示例的方式。

#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev  = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;

This is how you generate the samples on a modern C++ compiler.

#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev  = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;
冷︶言冷语的世界 2024-08-29 12:55:25

使用std::tr1::normal_distribution

std::tr1 命名空间不是 boost 的一部分。它是包含 C++ 技术报告 1 中添加的库的命名空间,并且可在最新的 Microsoft 编译器和 gcc 中使用,独立于 boost。

Use std::tr1::normal_distribution.

The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.

时光无声 2024-08-29 12:55:25

您可以使用 GSL。给出了一些完整示例来演示如何使用它。

You can use the GSL. Some complete examples are given to demonstrate how to use it.

岁月蹉跎了容颜 2024-08-29 12:55:25

如果您使用的是 C++11,则可以使用 std::normal_distribution

#include <random>

std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);

double randomNumber = distribution(generator);

还有许多其他分布可以用来转换随机数引擎的输出。

If you're using C++11, you can use std::normal_distribution:

#include <random>

std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);

double randomNumber = distribution(generator);

There are many other distributions you can use to transform the output of the random number engine.

请持续率性 2024-08-29 12:55:25

我遵循 http://www.mathworks 中给出的 PDF 定义。 com/help/stats/normal-distribution.html 并提出了这个:

const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
    return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
    return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
    return RandN2(0, 1.0);
}

这可能不是最好的方法,但它非常简单。

I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:

const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
    return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
    return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
    return RandN2(0, 1.0);
}

It is maybe not the best approach, but it's quite simple.

凉城凉梦凉人心 2024-08-29 12:55:25

comp.lang.c 常见问题解答列表分享了三种不同的方法来轻松生成具有高斯分布的随机数。

你可以看一下: http://c-faq.com/lib/gaussian.html

The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.

You may take a look of it: http://c-faq.com/lib/gaussian.html

不如归去 2024-08-29 12:55:25

Box-Muller 实施:

#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double RandomGenerator()
{
  return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
 // return a normally distributed random number
double normalRandom()
{
  double y1=RandomGenerator();
  double y2=RandomGenerator();
  return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}

int main(){
double sigma = 82.;
double Mi = 40.;
  for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
    cout << " x = " << x << endl;
  }
  return 0;
}

Box-Muller implementation:

#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double RandomGenerator()
{
  return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
 // return a normally distributed random number
double normalRandom()
{
  double y1=RandomGenerator();
  double y2=RandomGenerator();
  return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}

int main(){
double sigma = 82.;
double Mi = 40.;
  for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
    cout << " x = " << x << endl;
  }
  return 0;
}
绮烟 2024-08-29 12:55:25

存在多种用于逆累积正态分布的算法。量化金融中最受欢迎的方法在 http:// chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/

在我看来,除了来自 Wichura:它是机器精度、可靠且快速。高斯随机数生成很少出现瓶颈。

这里的最佳答案提倡 Box-Müller,您应该意识到它有已知的缺陷。我引用 https://www.sciencedirect.com/science/article/pii/S0895717710005935< /a>:

在文献中,Box-Muller 有时被认为稍逊一筹,主要有两个原因。首先,如果将 Box-Muller 方法应用于来自不良线性同余生成器的数字,则变换后的数字提供的空间覆盖率极差。具有螺旋尾部的变换数字图可以在许多书籍中找到,最著名的是里普利的经典著作,他可能是第一个做出这一观察的人”

There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/

In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.

The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:

in the literature, Box–Muller is sometimes regarded as slightly inferior, mainly for two reasons. First, if one applies the Box–Muller method to numbers from a bad linear congruential generator, the transformed numbers provide an extremely poor coverage of the space. Plots of transformed numbers with spiraling tails can be found in many books, most notably in the classic book of Ripley, who was probably the first to make this observation"

她说她爱他 2024-08-29 12:55:25

看看我发现了什么。

使用 Ziggurat 算法。

Take a look at what I found.

This library uses the Ziggurat algorithm.

伤感在游骋 2024-08-29 12:55:25

1) 生成高斯随机数的图形直观方法是使用类似于蒙特卡罗方法的方法。您可以使用 C 中的伪随机数生成器在高斯曲线周围的框中生成一个随机点。您可以使用分布方程计算该点是否位于高斯分布内部或下方。如果该点位于高斯分布内,则您将获得高斯随机数作为该点的 x 值。

这种方法并不完美,因为从技术上讲,高斯曲线趋向无穷大,并且您无法创建在 x 维度上接近无穷大的盒子。但高斯曲线在 y 维度上接近 0 的速度非常快,所以我不会担心这一点。 C 中变量大小的限制可能更多地限制了您的准确性。

2)另一种方法是使用中心极限定理,该定理指出,当添加独立随机变量时,它们会形成正态分布。记住这个定理,您可以通过添加大量独立随机变量来近似高斯随机数。

这些方法并不是最实用的,但是当您不想使用预先存在的库时,这是可以预料到的。请记住,这个答案来自于很少或没有微积分或统计经验的人。

1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.

This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.

2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.

These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.

私野 2024-08-29 12:55:25

蒙特卡罗方法
最直观的方法是使用蒙特卡罗方法。取合适的范围-X、+X。 X 值越大,正态分布越准确,但收敛时间较长。
一个。选择 -X 到 X 之间的随机数 z
b.保持概率为N(z,mean,variance),其中N是高斯分布。否则丢弃并返回步骤 (a)。

Monte Carlo method
The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge.
a. Choose a random number z between -X to X.
b. Keep with a probability of N(z, mean, variance) where N is the gaussian distribution. Drop otherwise and go back to step (a).

魔法少女 2024-08-29 12:55:25

计算机是确定性设备。计算中不存在随机性。
此外,CPU中的算术装置可以计算某些有限整数集(在有限域中执行计算)和有限实有理数集的和。并且还进行了按位运算。数学需要处理更大的集合,例如具有无限个点的 [0.0, 1.0]。

你可以用一些控制器监听计算机内部的一些电线,但它会有均匀的分布吗?我不知道。但是,如果假设它的信号是大量独立随机变量累加值的结果,那么您将收到近似正态分布的随机变量(这在概率论中得到了证明)

存在称为伪随机生成器的算法。我觉得伪随机生成器的目的是模拟随机性。而善良的标准是:
- 经验分布收敛(在某种意义上 - 逐点、均匀、L2)到理论分布
- 您从随机生成器收到的值似乎是独立的。当然,从“真实的角度”来看这不是真的,但我们假设这是真的。

一种流行的方法 - 你可以对均匀分布的 12 个 irv 求和......但是说实话,在傅里叶变换、泰勒级数的帮助下推导中心极限定理时,需要多次进行 n->+inf 假设。 例如理论 - 就我个人而言,我不明白人们如何执行均匀分布的 12 IRV 之和。

我在大学学过概率论。特别是对我来说,这只是一个数学问题。在大学里我看到了以下模型:


double generateUniform(double a, double b)
{
  return uniformGen.generateReal(a, b);
}

double generateRelei(double sigma)
{
  return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
  double y2 = generateUniform(0.0, 2 * kPi);
  double y1 = generateRelei(1.0);
  double x1 = y1 * cos(y2);
  return sigma*x1 + m;
}

这样的方式如何做它只是一个例子,我想它还存在另一种实现方式。

证明它是正确的可以在这本书中找到
Krishchenko Alexander Petrovich ISBN 5-7038-2485-0

不幸的是,我不知道这本书是否存在英文翻译。

Computer is deterministic device. There is no randomness in calculation.
Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.

You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)

There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is:
- the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical
- values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.

One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.

I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:


double generateUniform(double a, double b)
{
  return uniformGen.generateReal(a, b);
}

double generateRelei(double sigma)
{
  return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
  double y2 = generateUniform(0.0, 2 * kPi);
  double y1 = generateRelei(1.0);
  double x1 = y1 * cos(y2);
  return sigma*x1 + m;
}

Such way how todo it was just an example, I guess it exist another ways to implement it.

Provement that it is correct can be found in this book
"Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0

Unfortunately I don't know about existence of translation of this book into English.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文