如何生成相关的二元变量
我需要使用给定的相关函数生成一系列 N 随机二进制变量。令 x = {xi} 为一系列二进制变量(取值 0 或 1,< em>i 从 1 运行到 N)。边际概率给定 Pr(xi = 1) = p,并且变量应在以下关系中相关:如下方式:
Corr[ xi x j ] = const × |i−j|−α (对于 i!=j)
其中 α是一个正数。
如果更容易,请考虑相关函数:
Corr[ xi x j ] = (|i−j|+1)−α
重要的是我想研究当相关函数像幂律一样时的行为。 (不是 α|i−j| )
是否可以生成这样的序列,最好是在 Python 中?
I need to generate a series of N random binary variables with a given correlation function. Let x = {xi} be a series of binary variables (taking the value 0 or 1, i running from 1 to N). The marginal probability is given Pr(xi = 1) = p, and the variables should be correlated in the following way:
Corr[ xi xj ] = const × |i−j|−α (for i!=j)
where α is a positive number.
If it is easier, consider the correlation function:
Corr[ xi xj ] = (|i−j|+1)−α
The essential part is that I want to investigate the behavior when the correlation function goes like a power law. (not α|i−j| )
Is it possible to generate a series like this, preferably in Python?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这是一种似乎有效的直观/实验方法。
如果 b 是二进制 rv,
m 是二进制 rv 的平均值,
c 是你想要的相关性,
rand() 生成 U(0,1) rv,并且
d 是您想要的相关二进制 rv:
d = if(rand() < c, b, if(rand() < m , 0, 1))
也就是说,如果统一 rv 是小于所需的相关性,d = b。否则 d = 另一个随机二进制数。
我对一列 2000 个二进制 rvs 运行了 1000 次,其中 m=.5、c = .4 和 c = .5
相关平均值与指定的完全一致,分布呈正态。
对于 0.4 的相关性,相关性的标准偏差为 0.02。
抱歉 - 我无法证明这始终有效,但你必须承认,这确实很容易。
Here's an intuitive / experimental approach that seems to work.
If b is an binary r.v.,
m is the mean of the binary r.v.,
c is the correlation you want,
rand() generates a U(0,1) r.v., and
d is the correlated binary r.v. you want:
d = if(rand() < c, b, if(rand() < m , 0, 1))
That is if a uniform r.v. is less than the desired correlation, d = b. Otherwise d = another random binary number.
I ran this 1000 times for a column of 2000 binary r.v.s. with m=.5 and c = .4 and c = .5
The correlation mean was exactly as specified, the distribution appeared to be normal.
For a correlation of 0.4 the std deviation of the correlation was 0.02.
Sorry - I can't prove that this works all the time, but you have to admit, it sure is easy.
感谢您的所有投入。我在 Chul Gyu Park 等人的可爱小文章中找到了我的问题的答案,因此,如果有人遇到同样的问题,请查找:
“生成相关二进制变量的简单方法”(jstor.org.stable/ 2684925)
一个简单的算法。如果相关矩阵中的所有元素均为正,且对于一般边际分布 Pr(x_i)=p_i,则该算法有效。
j
Thanks for all your inputs. I found an answer to my question in the cute little article by Chul Gyu Park et al., so in case anyone run into the same problem, look up:
"A simple method for Generating Correlated Binary Variates" (jstor.org.stable/2684925)
for a simple algorithm. The algorithm works if all the elements in the correlation matrix are positive, and for a general marginal distribution Pr(x_i)=p_i.
j
您正在描述一个随机过程,对我来说这看起来是一个艰难的过程...如果您消除了二进制 (0,1) 要求,而是指定了期望值和方差,那么它会可以将其描述为通过 1 极低通滤波器馈送的白噪声发生器,我认为这将为您提供 α|ij| 特性。
这实际上可能符合 mathoverflow.net 的标准,具体取决于它的措辞方式。让我尝试询问......
更新:我在 mathoverflow.net 上询问 对于 α|ij| 情况。但也许其中有一些想法可以适合您的情况。
You're describing a random process, and it looks like a tough one to me... if you eliminated the binary (0,1) requirement, and instead specified the expected value and variance, it would be possible to describe this as a white noise generator feeding through a 1-pole low-pass filter, which I think would give you the α|i-j| characteristic.
This actually might meet the bar for mathoverflow.net, depending on how it is phrased. Let me try asking....
update: I did ask on mathoverflow.net for the α|i-j| case. But perhaps there are some ideas there that can be adapted to your case.
快速搜索 RSeek 显示R 有软件包
来做到这一点。
A quick search at RSeek reveals that R has packages
to do this.
强力解决方案是将问题的约束表示为具有
2^N
变量pr(w)
的线性程序,其中w
范围超过所有长度为N
的二进制字符串。首先,pr
是概率分布的约束:第二,每个变量的期望为
p
的约束:第三,协方差约束:
这非常慢,但是粗略的文献搜索并没有发现更好的结果。如果您决定实现它,这里有一些带有 Python 绑定的 LP 求解器:http://wiki。 python.org/moin/NumericAndScientific/Libraries
The brute force solution is to express the constraints of the problem as a linear program with
2^N
variablespr(w)
wherew
ranges over all binary strings of lengthN
. First the constraint thatpr
be a probability distribution:Second, the constraint that the expectation of each variable be
p
:Third, the covariance constraints:
This is very slow, but a cursory literature search turned up nothing better. If you decide to implement it, here are some LP solvers with Python bindings: http://wiki.python.org/moin/NumericAndScientific/Libraries
将分布 xi 表示为一些独立基分布 fj 的线性组合:x< sub>i = ai1f1 + ai2f2 + ...< /em> .让我们将 fj 约束为均匀分布在 0..1 或 {0,1}(离散)中的自变量。现在让我们用矩阵形式表达我们所知道的一切:
现在您需要求解两个方程:
第一个方程的解对应于求矩阵 3R 或 2R 的平方根。例如,请参阅http://en.wikipedia.org/wiki/Cholesky_factorization,一般http://en.wikipedia.org/wiki/Square_root_of_a_matrix 。
关于第二个也应该做一些事情:)
我请周围的数学家纠正我,因为我很可能将 ATA 与 AAT 混合在一起,或者做了一些更错误的事情。
要生成 xi 值作为基分布的线性混合,请使用两步过程:1) 使用均匀随机变量选择基分布之一分布,用相应的概率加权,2) 使用所选的基础分布生成结果。
Express the distribution xi as a linear combination of some independent basis distributions fj: xi = ai1f1 + ai2f2 + ... . Let us constrain fj to be independent variables uniformly distributed in 0..1 or in {0,1} (discrete). Let us now express everything we know in matrix form:
Now you need to solve the two equations:
Solution of the first equation corresponds to finding the square root of the matrix 3R or 2R. See for example http://en.wikipedia.org/wiki/Cholesky_factorization and generally http://en.wikipedia.org/wiki/Square_root_of_a_matrix .
Something also should be done about the second one :)
I ask mathematicians around to correct me, because I may very well have mixed ATA with AAT or done something even more wrong.
To generate a value of xi as a linear mixture of the basis distributions, use a two-step process: 1) use a uniform random variable to choose one of the basis distributions, weighted with corresponding probability, 2) generate a result using the chosen basis distribution.