如何生成相关的二元变量

发布于 2024-08-24 16:29:22 字数 658 浏览 2 评论 0原文

我需要使用给定的相关函数生成一系列 N 随机二进制变量。令 x = {xi} 为一系列二进制变量(取值 0 或 1,< em>i 从 1 运行到 N)。边际概率给定 Pr(xi = 1) = p,并且变量应在以下关系中相关:如下方式:

Corr[ xi x j ] = const × |ij|−α (对于 i!=j)

其中 α是一个正数。

如果更容易,请考虑相关函数:

Corr[ xi x j ] = (|ij|+1)−α

重要的是我想研究当相关函数像幂律一样时的行为。 (不是 α|ij|

是否可以生成这样的序列,最好是在 Python 中?

I need to generate a series of N random binary variables with a given correlation function. Let x = {xi} be a series of binary variables (taking the value 0 or 1, i running from 1 to N). The marginal probability is given Pr(xi = 1) = p, and the variables should be correlated in the following way:

Corr[ xi xj ] = const × |ij|−α (for i!=j)

where α is a positive number.

If it is easier, consider the correlation function:

Corr[ xi xj ] = (|ij|+1)−α

The essential part is that I want to investigate the behavior when the correlation function goes like a power law. (not α|ij| )

Is it possible to generate a series like this, preferably in Python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

星星的軌跡 2024-08-31 16:29:23

这是一种似乎有效的直观/实验方法。

如果 b 是二进制 rv,
m 是二进制 rv 的平均值,
c 是你想要的相关性,
rand() 生成 U(0,1) rv,并且
d 是您想要的相关二进制 rv:

d = if(rand() < c, b, if(rand() < m , 0, 1))

也就是说,如果统一 rv 是小于所需的相关性,d = b。否则 d = 另一个随机二进制数。

我对一列 2000 个二进制 rvs 运行了 1000 次,其中 m=.5、c = .4 和 c = .5
相关平均值与指定的完全一致,分布呈正态。
对于 0.4 的相关性,相关性的标准偏差为 0.02。

抱歉 - 我无法证明这始终有效,但你必须承认,这确实很容易。

Here's an intuitive / experimental approach that seems to work.

If b is an binary r.v.,
m is the mean of the binary r.v.,
c is the correlation you want,
rand() generates a U(0,1) r.v., and
d is the correlated binary r.v. you want:

d = if(rand() < c, b, if(rand() < m , 0, 1))

That is if a uniform r.v. is less than the desired correlation, d = b. Otherwise d = another random binary number.

I ran this 1000 times for a column of 2000 binary r.v.s. with m=.5 and c = .4 and c = .5
The correlation mean was exactly as specified, the distribution appeared to be normal.
For a correlation of 0.4 the std deviation of the correlation was 0.02.

Sorry - I can't prove that this works all the time, but you have to admit, it sure is easy.

最后的乘客 2024-08-31 16:29:22

感谢您的所有投入。我在 Chul Gyu Park 等人的可爱小文章中找到了我的问题的答案,因此,如果有人遇到同样的问题,请查找:

“生成相关二进制变量的简单方法”(jstor.org.stable/ 2684925)

一个简单的算法。如果相关矩阵中的所有元素均为正,且对于一般边际分布 Pr(x_i)=p_i,则该算法有效。

j

Thanks for all your inputs. I found an answer to my question in the cute little article by Chul Gyu Park et al., so in case anyone run into the same problem, look up:

"A simple method for Generating Correlated Binary Variates" (jstor.org.stable/2684925)

for a simple algorithm. The algorithm works if all the elements in the correlation matrix are positive, and for a general marginal distribution Pr(x_i)=p_i.

j

冷…雨湿花 2024-08-31 16:29:22

您正在描述一个随机过程,对我来说这看起来是一个艰难的过程...如果您消除了二进制 (0,1) 要求,而是指定了期望值和方差,那么它会可以将其描述为通过 1 极低通滤波器馈送的白噪声发生器,我认为这将为您提供 α|ij| 特性。

这实际上可能符合 mathoverflow.net 的标准,具体取决于它的措辞方式。让我尝试询问......


更新:我在 mathoverflow.net 上询问 对于 α|ij| 情况。但也许其中有一些想法可以适合您的情况。

You're describing a random process, and it looks like a tough one to me... if you eliminated the binary (0,1) requirement, and instead specified the expected value and variance, it would be possible to describe this as a white noise generator feeding through a 1-pole low-pass filter, which I think would give you the α|i-j| characteristic.

This actually might meet the bar for mathoverflow.net, depending on how it is phrased. Let me try asking....


update: I did ask on mathoverflow.net for the α|i-j| case. But perhaps there are some ideas there that can be adapted to your case.

伤感在游骋 2024-08-31 16:29:22

快速搜索 RSeek 显示R 有软件包

来做到这一点。

A quick search at RSeek reveals that R has packages

to do this.

阳光下的泡沫是彩色的 2024-08-31 16:29:22

强力解决方案是将问题的约束表示为具有 2^N 变量 pr(w) 的线性程序,其中 w 范围超过所有长度为 N 的二进制字符串。首先,pr 是概率分布的约束:

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

第二,每个变量的期望为 p 的约束:

for all i: sum_{w such that w[i] = 1} pr(w) = p

第三,协方差约束:

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

这非常慢,但是粗略的文献搜索并没有发现更好的结果。如果您决定实现它,这里有一些带有 Python 绑定的 LP 求解器:http://wiki。 python.org/moin/NumericAndScientific/Libraries

The brute force solution is to express the constraints of the problem as a linear program with 2^N variables pr(w) where w ranges over all binary strings of length N. First the constraint that pr be a probability distribution:

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

Second, the constraint that the expectation of each variable be p:

for all i: sum_{w such that w[i] = 1} pr(w) = p

Third, the covariance constraints:

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

This is very slow, but a cursory literature search turned up nothing better. If you decide to implement it, here are some LP solvers with Python bindings: http://wiki.python.org/moin/NumericAndScientific/Libraries

煮酒 2024-08-31 16:29:22

将分布 xi 表示为一些独立基分布 fj 的线性组合:x< sub>i = ai1f1 + ai2f2 + ...< /em> .让我们将 fj 约束为均匀分布在 0..1 或 {0,1}(离散)中的自变量。现在让我们用矩阵形式表达我们所知道的一切:

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

现在您需要求解两个方程:

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

第一个方程的解对应于求矩阵 3R 或 2R 的平方根。例如,请参阅http://en.wikipedia.org/wiki/Cholesky_factorization,一般http://en.wikipedia.org/wiki/Square_root_of_a_matrix
关于第二个也应该做一些事情:)

我请周围的数学家纠正我,因为我很可能将 ATA 与 AAT 混合在一起,或者做了一些更错误的事情。

要生成 xi 值作为基分布的线性混合,请使用两步过程:1) 使用均匀随机变量选择基分布之一分布,用相应的概率加权,2) 使用所选的基础分布生成结果。

Express the distribution xi as a linear combination of some independent basis distributions fj: xi = ai1f1 + ai2f2 + ... . Let us constrain fj to be independent variables uniformly distributed in 0..1 or in {0,1} (discrete). Let us now express everything we know in matrix form:

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

Now you need to solve the two equations:

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

Solution of the first equation corresponds to finding the square root of the matrix 3R or 2R. See for example http://en.wikipedia.org/wiki/Cholesky_factorization and generally http://en.wikipedia.org/wiki/Square_root_of_a_matrix .
Something also should be done about the second one :)

I ask mathematicians around to correct me, because I may very well have mixed ATA with AAT or done something even more wrong.

To generate a value of xi as a linear mixture of the basis distributions, use a two-step process: 1) use a uniform random variable to choose one of the basis distributions, weighted with corresponding probability, 2) generate a result using the chosen basis distribution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文