Python中的层次聚类问题
我正在通过相关距离度量(即 1 - Pearson 相关性)对二维矩阵进行分层聚类。我的代码如下(数据位于名为“data”的变量中):
from hcluster import *
Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)
我得到的错误是:
ValueError: Linkage 'Z' contains negative distances.
是什么导致了此错误?我使用的矩阵“数据”很简单:
[[ 156.651968 2345.168618]
[ 158.089968 2032.840106]
[ 207.996413 2786.779081]
[ 151.885804 2286.70533 ]
[ 154.33665 1967.74431 ]
[ 150.060182 1931.991169]
[ 133.800787 1978.539644]
[ 112.743217 1478.903191]
[ 125.388905 1422.3247 ]]
我不明白 pdist 在采用 1 - pearson 相关性时如何产生负数。对此有什么想法吗?
谢谢。
I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 - Pearson correlation). My code is the following (the data is in a variable called "data"):
from hcluster import *
Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)
The error I get is:
ValueError: Linkage 'Z' contains negative distances.
What causes this error? The matrix "data" that I use is simply:
[[ 156.651968 2345.168618]
[ 158.089968 2032.840106]
[ 207.996413 2786.779081]
[ 151.885804 2286.70533 ]
[ 154.33665 1967.74431 ]
[ 150.060182 1931.991169]
[ 133.800787 1978.539644]
[ 112.743217 1478.903191]
[ 125.388905 1422.3247 ]]
I don't see how pdist could ever produce negative numbers when taking 1 - pearson correlation. Any ideas on this?
thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有一些可爱的浮点问题正在发生。如果您查看 pdist 的结果,您会发现其中有非常小的负数(-2.22044605e-16)。本质上,它们应该为零。如果您愿意,可以使用 numpy 的 Clip 函数来处理它。
There are some lovely floating point problems going on. If you look at the results of pdist, you'll find there are very small negative numbers (-2.22044605e-16) in them. Essentially, they should be zero. You can use numpy's clip function to deal with it if you would like.
如果您收到错误
KeyError: -428
并且您的代码位于
`
这是由于查询索引不匹配造成的。
可能想要更新到
If you were getting error
KeyError: -428
and your code was on the lines of
`
It is due to the mismatch in indexes of queries.
Might want to update to