Python scipy/numpy 中相关性的层次聚类?
如何在 scipy/numpy 中的相关矩阵上运行层次聚类?我有一个 100 行 x 9 列的矩阵,我想根据 9 个条件中每个条目的相关性进行分层聚类。我想使用 1-pearson 相关性作为聚类距离。假设我有一个包含 100 x 9 矩阵的 numpy 数组 X,我该怎么做?
我尝试使用 hcluster,基于此示例:
Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)
但是,pdist
不是我想要的,因为这是一个欧几里德距离。有什么想法吗?
谢谢。
How can I run hierarchical clustering on a correlation matrix in scipy
/numpy
? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy
array X
that contains the 100 x 9 matrix, how can I do this?
I tried using hcluster, based on this example:
Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)
However, pdist
is not what I want, since that's a euclidean distance. Any ideas?
thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需将指标更改为相关性,使第一行变为:
但是,我相信代码可以简化为:
因为链接会为您处理 pdist。
Just change the metric to
correlation
so that the first line becomes:However, I believe that the code can be simplified to just:
because linkage will take care of the pdist for you.
我发现在使用“相关性”作为 pdist 的度量之后,使用seaborn clustermap(它使用下面的 scipy 进行聚类)执行和可视化层次聚类很有帮助:
您还可以恢复相应的链接矩阵并绘制树状图
I find it helpful to perform and visualize the hierarchical clustering using the seaborn clustermap (which uses underneath scipy for the clustering), after having used 'correlation' as a metric for pdist:
You can also recover the corresponding linkage matrix and plot the dendrogram