Python中的层次聚类问题

发布于 2024-09-03 17:01:06 字数 709 浏览 3 评论 0原文

我正在通过相关距离度量(即 1 - Pearson 相关性)对二维矩阵进行分层聚类。我的代码如下(数据位于名为“data”的变量中):

from hcluster import *

Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)

我得到的错误是:

ValueError: Linkage 'Z' contains negative distances. 

是什么导致了此错误?我使用的矩阵“数据”很简单:

[[  156.651968  2345.168618]
 [  158.089968  2032.840106]
 [  207.996413  2786.779081]
 [  151.885804  2286.70533 ]
 [  154.33665   1967.74431 ]
 [  150.060182  1931.991169]
 [  133.800787  1978.539644]
 [  112.743217  1478.903191]
 [  125.388905  1422.3247  ]]

我不明白 pdist 在采用 1 - pearson 相关性时如何产生负数。对此有什么想法吗?

谢谢。

I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 - Pearson correlation). My code is the following (the data is in a variable called "data"):

from hcluster import *

Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)

The error I get is:

ValueError: Linkage 'Z' contains negative distances. 

What causes this error? The matrix "data" that I use is simply:

[[  156.651968  2345.168618]
 [  158.089968  2032.840106]
 [  207.996413  2786.779081]
 [  151.885804  2286.70533 ]
 [  154.33665   1967.74431 ]
 [  150.060182  1931.991169]
 [  133.800787  1978.539644]
 [  112.743217  1478.903191]
 [  125.388905  1422.3247  ]]

I don't see how pdist could ever produce negative numbers when taking 1 - pearson correlation. Any ideas on this?

thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

回梦 2024-09-10 17:01:06

有一些可爱的浮点问题正在发生。如果您查看 pdist 的结果,您会发现其中有非常小的负数(-2.22044605e-16)。本质上,它们应该为零。如果您愿意,可以使用 numpy 的 Clip 函数来处理它。

There are some lovely floating point problems going on. If you look at the results of pdist, you'll find there are very small negative numbers (-2.22044605e-16) in them. Essentially, they should be zero. You can use numpy's clip function to deal with it if you would like.

各自安好 2024-09-10 17:01:06

如果您收到错误

KeyError: -428

并且您的代码位于

import matplotlib.pyplot as plt
import matplotlib as mpl

%matplotlib inline 
from scipy.cluster.hierarchy import ward, dendrogram

linkage_matrix = ward(dist) #define the linkage_matrix using ward clustering pre-computed distances
fig, ax = plt.subplots(figsize=(35, 20),dpi=400) # set size
ax = dendrogram(linkage_matrix, orientation="right",labels=queries);

`
这是由于查询索引不匹配造成的。

可能想要更新到

ax = dendrogram(linkage_matrix, orientation="right",labels=list(queries));

If you were getting error

KeyError: -428

and your code was on the lines of

import matplotlib.pyplot as plt
import matplotlib as mpl

%matplotlib inline 
from scipy.cluster.hierarchy import ward, dendrogram

linkage_matrix = ward(dist) #define the linkage_matrix using ward clustering pre-computed distances
fig, ax = plt.subplots(figsize=(35, 20),dpi=400) # set size
ax = dendrogram(linkage_matrix, orientation="right",labels=queries);

`
It is due to the mismatch in indexes of queries.

Might want to update to

ax = dendrogram(linkage_matrix, orientation="right",labels=list(queries));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文