产生幂律分布的随机数生成器?

发布于 2024-07-23 10:08:28 字数 188 浏览 14 评论 0原文

我正在为 C++ 命令行 Linux 应用程序编写一些测试。 我想生成一堆具有幂律/长尾分布的整数。 意思是,我非常频繁地收到一些数字,但大多数数字相对较少。

理想情况下,我可以将一些神奇的方程与 rand() 或 stdlib 随机函数之一一起使用。 如果没有,一个易于使用的 C/C++ 块会很棒。

谢谢!

I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.

Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

贵在坚持 2024-07-30 10:08:28

Wolfram MathWorld 的页面讨论了如何从均匀分布(即大多数随机数生成器提供的内容)。

简短的答案(在上面的链接中推导):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

其中 y 是均匀变量,n 是分布能力,x0x1 定义分布范围,x 是幂律分布变量。

This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).

The short answer (derivation at the above link):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.

幽蝶幻影 2024-07-30 10:08:28

如果您知道所需的分布(称为概率分布函数 (PDF))并将其正确归一化,则可以将其积分以获得累积分布函数 (CDF),然后反转 CDF(如果可能)以获得您想要的变换需要从统一的 [0,1] 分布到您想要的。

因此,您首先要定义所需的分布。

P = F(x)

(for x in [0,1]) 然后积分得到

C(y) = \int_0^y F(x) dx

如果这可以反转,你会得到

y = F^{-1}(C)

所以调用 rand() 并将结果作为 C 插入最后线并使用 y。

这个结果称为采样基本定理。 由于归一化要求和需要分析求逆函数,因此这是一个麻烦。

或者,您可以使用拒绝技术:在所需范围内统一抛出一个数字,然后抛出另一个数字,并与第一次抛出指示的位置处的 PDF 进行比较。 如果第二次投掷超过 PDF,则拒绝。 对于具有大量低概率区域的 PDF(例如具有长尾的 PDF)而言,效率往往较低...

一种中间方法涉及通过强力反转 CDF:将 CDF 存储为查找表,然后进行反向查找以获取结果。


这里真正的问题是简单的x^-n分布在[0,1]范围内是不可归一化的,所以你不能使用采样定理。 尝试 (x+1)^-n 代替...

If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1] distribution to your desired.

So you start by defining the distribution you want.

P = F(x)

(for x in [0,1]) then integrated to give

C(y) = \int_0^y F(x) dx

If this can be inverted you get

y = F^{-1}(C)

So call rand() and plug the result in as C in the last line and use y.

This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.

Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...

An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.


The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...

辞别 2024-07-30 10:08:28

我只是想进行实际模拟,作为对(正确)接受的答案的补充。 尽管在 R 中,代码非常简单,就像是(伪)伪代码。

接受的答案中的 Wolfram MathWorld 公式 与其他可能更常见的方程之间存在微小差异事实上,幂律指数 n(通常表示为 alpha)不带有明确的负号。 因此,所选的 alpha 值必须为负数,通常在 2 到 3 之间。

x0x1 代表分布的下限和上限。

所以这里是:

set.seed(0)
x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e7)   # Number of samples
x  = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)

在此处输入图像描述

或以对数刻度绘制:

plot(density(x), log="xy", ylab="log density x", col=2)

在此处输入图像描述

以下是数据摘要:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388 

I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.

One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.

x0 and x1 stand for the lower and upper limits of the distribution.

So here it is:

set.seed(0)
x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e7)   # Number of samples
x  = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)

enter image description here

or plotted in logarithmic scale:

plot(density(x), log="xy", ylab="log density x", col=2)

enter image description here

Here is the summary of the data:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388 
掩耳倾听 2024-07-30 10:08:28

我无法评论生成幂律分布所需的数学(其他帖子有建议),但我建议您熟悉 中的 TR1 C++ 标准库随机数工具。 它们提供了比 std::rand 和 std::srand 更多的功能。 新系统为发电机、引擎和分配器指定了模块化 API,并提供了一系列预设。

包含的分布预设为:

  • uniform_int
  • bernoulli_distribution
  • geometric_distribution
  • poisson_distribution
  • binomial_distribution
  • uniform_real
  • exponential_distribution
  • normal_distribution
  • gamma_distribution

当您定义幂律分布时,您应该能够将其插入现有的发电机和引擎。 Pete Becker 所著的《The C++ Standard Library Extensions》一书有一个关于 的精彩章节。

这是一篇关于如何创建其他分布的文章(包含柯西分布、卡方分布、学生 t 和 Snedecor F)

I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.

The included distribution presets are:

  • uniform_int
  • bernoulli_distribution
  • geometric_distribution
  • poisson_distribution
  • binomial_distribution
  • uniform_real
  • exponential_distribution
  • normal_distribution
  • gamma_distribution

When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.

Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文