Python scipy/numpy 中相关性的层次聚类?

发布于 2024-09-03 09:57:30 字数 353 浏览 8 评论 0原文

如何在 scipy/numpy 中的相关矩阵上运行层次聚类?我有一个 100 行 x 9 列的矩阵,我想根据 9 个条件中每个条目的相关性进行分层聚类。我想使用 1-pearson 相关性作为聚类距离。假设我有一个包含 100 x 9 矩阵的 numpy 数组 X,我该怎么做?

我尝试使用 hcluster,基于此示例:

Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)

但是,pdist 不是我想要的,因为这是一个欧几里德距离。有什么想法吗?

谢谢。

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array X that contains the 100 x 9 matrix, how can I do this?

I tried using hcluster, based on this example:

Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)

However, pdist is not what I want, since that's a euclidean distance. Any ideas?

thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝戈者 2024-09-10 09:57:30

只需将指标更改为相关性,使第一行变为:

Y=pdist(X, 'correlation')

但是,我相信代码可以简化为:

Z=linkage(X, 'single', 'correlation')
dendrogram(Z, color_threshold=0)

因为链接会为您处理 pdist。

Just change the metric to correlation so that the first line becomes:

Y=pdist(X, 'correlation')

However, I believe that the code can be simplified to just:

Z=linkage(X, 'single', 'correlation')
dendrogram(Z, color_threshold=0)

because linkage will take care of the pdist for you.

绿光 2024-09-10 09:57:30

我发现在使用“相关性”作为 pdist 的度量之后,使用seaborn clustermap(它使用下面的 scipy 进行聚类)执行和可视化层次聚类很有帮助:

import seaborn as sns
from scipy.cluster.hierarchy import dendrogram
from scipy.spatial.distance import pdist, squareform

D = squareform(pdist(X.T, 'correlation'))
h = sns.clustermap(D, cmap='Reds')

您还可以恢复相应的链接矩阵并绘制树状图

Z = h.dendrogram_col.linkage    
dendrogram(Z, color_threshold=0)

I find it helpful to perform and visualize the hierarchical clustering using the seaborn clustermap (which uses underneath scipy for the clustering), after having used 'correlation' as a metric for pdist:

import seaborn as sns
from scipy.cluster.hierarchy import dendrogram
from scipy.spatial.distance import pdist, squareform

D = squareform(pdist(X.T, 'correlation'))
h = sns.clustermap(D, cmap='Reds')

You can also recover the corresponding linkage matrix and plot the dendrogram

Z = h.dendrogram_col.linkage    
dendrogram(Z, color_threshold=0)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文