从 n 维单位单纯形中均匀随机采样

发布于 2024-09-04 19:58:24 字数 1166 浏览 3 评论 0原文

从 n 维单位单纯形中均匀随机采样是一种奇特的方式，表示您想要 n 个随机数，使得

它们都是非负的，
它们的总和为 1，并且
n 个非负数的每个可能向量的总和为一个的可能性是相等的。

在 n=2 的情况下，您希望从正象限中的 x+y=1 线段（即 y=1-x）均匀采样。在 n=3 的情况下，您从平面 x+y+z=1 的三角形部分（位于 R3 的正八分圆内）进行采样：

（图片来自 http://en.wikipedia.org/wiki/Simplex。）

请注意，选择 n 个均匀的随机数然后将它们标准化以使它们总和为 1 是行不通的。你最终会偏向不太极端的数字。

类似地，选择 n-1 个均匀随机数，然后将第 n 个设为 1 减去它们的总和也会引入偏差。

维基百科提供了两种算法来正确执行此操作： http://en.wikipedia.org/wiki/Simplex#随机采样（尽管第二个目前声称仅在实践中正确，而不是在理论上正确。我希望当我更好地理解这一点时能够清理或澄清它。我最初陷入了“警告：这样那样的论文声明”维基百科页面上的以下内容是错误的”，其他人将其变成了“仅在实践中有效”的警告。）

最后，问题是：您认为 Mathematica 中单纯形抽样的最佳实现是什么（最好通过经验确认其正确性）？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别念他 2024-09-11 19:58:24

此代码可以工作：

samples[n_] := Differences[Join[{0}, Sort[RandomReal[Range[0, 1], n - 1]], {1}]]

基本上，您只需在间隔 [0,1] 上选择 n - 1 个位置即可将其分割，然后使用 < 获取每个部分的大小代码>差异。

快速运行 Timing 表明它比 Janus 的第一个答案要快一点。

This code can work:

samples[n_] := Differences[Join[{0}, Sort[RandomReal[Range[0, 1], n - 1]], {1}]]

Basically you just choose n - 1 places on the interval [0,1] to split it up then take the size of each of the pieces using Differences.

A quick run of Timing on this shows that it's a little faster than Janus's first answer.

回复收藏 0 原文

墨落画卷 2024-09-11 19:58:24

经过一番挖掘后，我发现此页面它提供了一个很好的实现狄利克雷分布。从这里看来，遵循维基百科的方法 1 似乎非常简单。这似乎是最好的方法。

作为测试：

In[14]:= RandomReal[DirichletDistribution[{1,1}],WorkingPrecision->25]
Out[14]= {0.8428995243540368880268079,0.1571004756459631119731921}
In[15]:= Total[%]
Out[15]= 1.000000000000000000000000

100 个样本的图：

替代文本 http://www. public.iastate.edu/~zdavkeos/simplex-sample.png

After a little digging around, I found this page which gives a nice implementation of the Dirichlet Distribution. From there it seems like it would be pretty simple to follow Wikipedia's method 1. This seems like the best way to do it.

As a test:

In[14]:= RandomReal[DirichletDistribution[{1,1}],WorkingPrecision->25]
Out[14]= {0.8428995243540368880268079,0.1571004756459631119731921}
In[15]:= Total[%]
Out[15]= 1.000000000000000000000000

A plot of 100 samples:

alt text http://www.public.iastate.edu/~zdavkeos/simplex-sample.png

回复收藏 0 原文

不念旧人 2024-09-11 19:58:24

我支持 zdav：狄利克雷分布似乎是最简单的方法，zdav 所指的狄利克雷分布采样算法也出现在狄利克雷分布。

在实现方面，首先执行完整的狄利克雷分布会产生一些开销，因为您真正需要的是 n 个随机 Gamma[1,1] 样本。比较如下
简单实现

SimplexSample[n_, opts:OptionsPattern[RandomReal]] :=
  (#/Total[#])& @ RandomReal[GammaDistribution[1,1],n,opts]

完整 Dirichlet 实现

DirichletDistribution/:Random`DistributionVector[
 DirichletDistribution[alpha_?(VectorQ[#,Positive]&)],n_Integer,prec_?Positive]:=
    Block[{gammas}, gammas = 
        Map[RandomReal[GammaDistribution[#,1],n,WorkingPrecision->prec]&,alpha];
      Transpose[gammas]/Total[gammas]]

SimplexSample2[n_, opts:OptionsPattern[RandomReal]] := 
  (#/Total[#])& @ RandomReal[DirichletDistribution[ConstantArray[1,{n}]],opts]

时序

Timing[Table[SimplexSample[10,WorkingPrecision-> 20],{10000}];]
Timing[Table[SimplexSample2[10,WorkingPrecision-> 20],{10000}];]
Out[159]= {1.30249,Null}
Out[160]= {3.52216,Null}

因此，完整 Dirichlet 速度慢了 3 倍。如果您一次需要 m>1 个样本点，您可能可以通过执行 (#/Total[#]&)/@RandomReal[GammaDistribution[1,1],{m,n}].

I'm with zdav: the Dirichlet distribution seems to be the easiest way ahead, and the algorithm for sampling the Dirichlet distribution which zdav refers to is also presented on the Wikipedia page on the Dirichlet distribution.

Implementationwise, it is a bit of an overhead to do the full Dirichlet distribution first, as all you really need is n random Gamma[1,1] samples. Compare below
Simple implementation

SimplexSample[n_, opts:OptionsPattern[RandomReal]] :=
  (#/Total[#])& @ RandomReal[GammaDistribution[1,1],n,opts]

Full Dirichlet implementation

DirichletDistribution/:Random`DistributionVector[
 DirichletDistribution[alpha_?(VectorQ[#,Positive]&)],n_Integer,prec_?Positive]:=
    Block[{gammas}, gammas = 
        Map[RandomReal[GammaDistribution[#,1],n,WorkingPrecision->prec]&,alpha];
      Transpose[gammas]/Total[gammas]]

SimplexSample2[n_, opts:OptionsPattern[RandomReal]] := 
  (#/Total[#])& @ RandomReal[DirichletDistribution[ConstantArray[1,{n}]],opts]

Timing

Timing[Table[SimplexSample[10,WorkingPrecision-> 20],{10000}];]
Timing[Table[SimplexSample2[10,WorkingPrecision-> 20],{10000}];]
Out[159]= {1.30249,Null}
Out[160]= {3.52216,Null}

So the full Dirichlet is a factor of 3 slower. If you need m>1 samplepoints at a time, you could probably win further by doing (#/Total[#]&)/@RandomReal[GammaDistribution[1,1],{m,n}].

回复收藏 0 原文

负佳期 2024-09-11 19:58:24

这是来自 Wikipedia 的第二种算法的简洁实现：

SimplexSample[n_] := Rest@# - Most@# &[Sort@Join[{0,1}, RandomReal[{0,1}, n-1]]]

改编自此处：http://www.mofeel.net/1164-comp-soft -sys-math-mathematica/14968.aspx
（最初它使用 Union 而不是 Sort@Join ——后者稍微快一些。）

（请参阅评论以获取一些证明这是正确的证据！）

Here's a nice concise implementation of the second algorithm from Wikipedia:

SimplexSample[n_] := Rest@# - Most@# &[Sort@Join[{0,1}, RandomReal[{0,1}, n-1]]]

That's adapted from here: http://www.mofeel.net/1164-comp-soft-sys-math-mathematica/14968.aspx
(Originally it had Union instead of Sort@Join -- the latter is slightly faster.)

(See comments for some evidence that this is correct!)

回复收藏 0 原文