GSL 和相关性
我正在使用 GSL 库 1.14 和 ruby 包装器 (gsl) 进行一些数学计算。我需要的一件事是皮尔逊相关性。但是当我的数组中有 0 时,我遇到了问题。
例如,我有这段代码:
x = [1,2,2,2,12]
y = [1,2,1,3,33]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> 0.9967291641974002
但是当我尝试使用以下数组值计算它时,我得到一个 NaN:
x = [1,1,1]
y = [1,1,1]
or
x = [0,1,1]
y = [1,1,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> NaN
当我尝试使用这个值时,它起作用了:
x = [0,1,1]
y = [1,0,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> -0.5
有人知道为什么吗?这很奇怪,不是吗?
I'm using the GSL library 1.14 and the ruby wrapper (gsl) for some math calculation. One thing that I need is the Pearson correlation. But I have a problem when 0 in my array.
For example I have this snippet of code:
x = [1,2,2,2,12]
y = [1,2,1,3,33]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> 0.9967291641974002
But when I try to calculate it with the following array values, I get an NaN:
x = [1,1,1]
y = [1,1,1]
or
x = [0,1,1]
y = [1,1,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> NaN
And when I try with this values, it works:
x = [0,1,1]
y = [1,0,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> -0.5
Does anybody know why? this is very strange, isn't it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道 GSL 的实现,但一般来说,皮尔逊相关系数的计算涉及除以两个标准差,因此如果其中任何一个为 0,计算就会失败。如果所有向量元素相等,则标准差为 0。所有失败的示例都有一个具有相同元素的向量。我希望这能回答你的问题。
I do not know the GSL implementation, but in general, the calculation of the Pearson correlation coefficient involves dividing through both standart deviations so if any of them is 0, the calculation fails. The standart deviation is 0 if all vector elements are equal. All of your failing examples have one vector with equal elements. I hope this answers your question.
理论上,相关性意味着找到两个数据集之间的关系。根据数据集的模式,它可能是正数或负数。但我想传达的是,当您将 0 作为数据集的元素之一时,您无法将数量 0 与其他数据集的其他非零元素。这就是它给出 NaN 的原因。
theoretically correlation means finding the relation between two data sets.it could be positive or negative depending on the pattern of the datasets.but what i wanted to convey is when you have 0 as one of the element of your data sets,you cannot correlate the quantity 0 with other non-zero element of the other data set.that is why it is giving NaN.