当前位置：文江博客话题详情

如何生成相关的二元变量

发布于 2024-08-24 16:29:22 字数 658 浏览 2 评论 0原文

我需要使用给定的相关函数生成一系列 N 随机二进制变量。令 x = {x_i} 为一系列二进制变量（取值 0 或 1，< em>i 从 1 运行到 N）。边际概率给定 Pr(x_i = 1) = p，并且变量应在以下关系中相关：如下方式：

Corr[ x_i x_j ] = const × |i−j|^−α （对于 i!=j）

其中 α是一个正数。

如果更容易，请考虑相关函数：

Corr[ x_i x _j ] = (|i−j|+1)^−α

重要的是我想研究当相关函数像幂律一样时的行为。（不是 α^|i−j| ）

是否可以生成这样的序列，最好是在 Python 中？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星星的軌跡 2024-08-31 16:29:23

这是一种似乎有效的直观/实验方法。

如果 b 是二进制 rv，
m 是二进制 rv 的平均值，
c 是你想要的相关性，
rand() 生成 U(0,1) rv，并且
d 是您想要的相关二进制 rv：

d = if(rand() < c, b, if(rand() < m , 0, 1))

也就是说，如果统一 rv 是小于所需的相关性，d = b。否则 d = 另一个随机二进制数。

我对一列 2000 个二进制 rvs 运行了 1000 次，其中 m=.5、c = .4 和 c = .5
相关平均值与指定的完全一致，分布呈正态。
对于 0.4 的相关性，相关性的标准偏差为 0.02。

抱歉 - 我无法证明这始终有效，但你必须承认，这确实很容易。

回复收藏 0 原文

最后的乘客 2024-08-31 16:29:22

感谢您的所有投入。我在 Chul Gyu Park 等人的可爱小文章中找到了我的问题的答案，因此，如果有人遇到同样的问题，请查找：

“生成相关二进制变量的简单方法”（jstor.org.stable/ 2684925）

一个简单的算法。如果相关矩阵中的所有元素均为正，且对于一般边际分布 Pr(x_i)=p_i，则该算法有效。

回复收藏 0 原文

冷…雨湿花 2024-08-31 16:29:22

您正在描述一个随机过程，对我来说这看起来是一个艰难的过程...如果您消除了二进制 (0,1) 要求，而是指定了期望值和方差，那么它会可以将其描述为通过 1 极低通滤波器馈送的白噪声发生器，我认为这将为您提供 α^|ij| 特性。

这实际上可能符合 mathoverflow.net 的标准，具体取决于它的措辞方式。让我尝试询问......

更新：我在 mathoverflow.net 上询问对于 α^|ij| 情况。但也许其中有一些想法可以适合您的情况。

回复收藏 0 原文

伤感在游骋 2024-08-31 16:29:22

快速搜索 RSeek 显示R 有软件包

来做到这一点。

回复收藏 0 原文

阳光下的泡沫是彩色的 2024-08-31 16:29:22

强力解决方案是将问题的约束表示为具有 2^N 变量 pr(w) 的线性程序，其中 w 范围超过所有长度为 N 的二进制字符串。首先，pr 是概率分布的约束：

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

第二，每个变量的期望为 p 的约束：

for all i: sum_{w such that w[i] = 1} pr(w) = p

第三，协方差约束：

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

这非常慢，但是粗略的文献搜索并没有发现更好的结果。如果您决定实现它，这里有一些带有 Python 绑定的 LP 求解器：http://wiki。 python.org/moin/NumericAndScientific/Libraries

The brute force solution is to express the constraints of the problem as a linear program with 2^N variables pr(w) where w ranges over all binary strings of length N. First the constraint that pr be a probability distribution:

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

Second, the constraint that the expectation of each variable be p:

for all i: sum_{w such that w[i] = 1} pr(w) = p

Third, the covariance constraints:

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

This is very slow, but a cursory literature search turned up nothing better. If you decide to implement it, here are some LP solvers with Python bindings: http://wiki.python.org/moin/NumericAndScientific/Libraries

回复收藏 0 原文

煮酒 2024-08-31 16:29:22

将分布 x_i 表示为一些独立基分布 f_j 的线性组合：x< sub>i = a_i1f₁ + a_i2f₂ + ...< /em> .让我们将 f_j 约束为均匀分布在 0..1 或 {0,1}（离散）中的自变量。现在让我们用矩阵形式表达我们所知道的一切：

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

现在您需要求解两个方程：

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

第一个方程的解对应于求矩阵 3R 或 2R 的平方根。例如，请参阅http://en.wikipedia.org/wiki/Cholesky_factorization，一般http://en.wikipedia.org/wiki/Square_root_of_a_matrix 。
关于第二个也应该做一些事情:)

我请周围的数学家纠正我，因为我很可能将 ATA 与 AAT 混合在一起，或者做了一些更错误的事情。

要生成 x_i 值作为基分布的线性混合，请使用两步过程：1) 使用均匀随机变量选择基分布之一分布，用相应的概率加权，2) 使用所选的基础分布生成结果。

Express the distribution x_i as a linear combination of some independent basis distributions f_j: x_i = a_i1f₁ + a_i2f₂ + ... . Let us constrain f_j to be independent variables uniformly distributed in 0..1 or in {0,1} (discrete). Let us now express everything we know in matrix form:

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

Now you need to solve the two equations:

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

Solution of the first equation corresponds to finding the square root of the matrix 3R or 2R. See for example http://en.wikipedia.org/wiki/Cholesky_factorization and generally http://en.wikipedia.org/wiki/Square_root_of_a_matrix .
Something also should be done about the second one :)

I ask mathematicians around to correct me, because I may very well have mixed ATA with AAT or done something even more wrong.

To generate a value of x_i as a linear mixture of the basis distributions, use a two-step process: 1) use a uniform random variable to choose one of the basis distributions, weighted with corresponding probability, 2) generate a result using the chosen basis distribution.

回复收藏 0 原文

~没有更多了~