在数据矩阵上绘制层次聚类的结果

发布于 2024-09-04 01:16:55 字数 475 浏览 10 评论 0原文

在Python中,如何在值矩阵的顶部绘制树状图,并适当地重新排序以反映聚类?下图就是一个示例:

在此处输入图像描述

这是图 6,来自:一组诱导多能性黑猩猩干细胞:比较功能基因组学的资源

我使用 scipy.cluster.dendrogram 来制作树状图并在数据矩阵上执行分层聚类。然后,如何将数据绘制为矩阵,其中行已重新排序以反映在特定阈值切割树状图引起的聚类,并将树状图与矩阵一起绘制?我知道如何在 scipy 中绘制树状图,但不知道如何绘制数据的强度矩阵及其旁边的右侧比例尺。

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is the following figure:

enter image description here

This is Figure 6 from: A panel of induced pluripotent stem cells from chimpanzees: a resource for comparative functional genomics

I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. How can I then plot the data as a matrix where the rows have been reordered to reflect a clustering induced by the cutting the dendrogram at a particular threshold, and have the dendrogram plotted alongside the matrix? I know how to plot the dendrogram in scipy, but not how to plot the intensity matrix of data with the right scale bar next to it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你げ笑在眉眼 2024-09-11 01:16:55

这个问题没有很好地定义矩阵:“值矩阵”,“数据矩阵”。我假设你指的是距离矩阵。换句话说,对称非负N×N距离矩阵D中的元素D_ij表示两个特征向量x_i和x_j之间的距离。这是正确的吗?

如果是这样,请尝试此操作(2010 年 6 月 13 日编辑,以反映两个不同的树状图)。

python 3.10matplotlib 3.5.1中测试

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from scipy.spatial.distance import squareform

# Generate random features and distance matrix.
np.random.seed(200)  # for reproducible data
x = np.random.rand(40)
D = np.zeros([40, 40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

condensedD = squareform(D)

# Compute and plot first dendrogram.
fig = plt.figure(figsize=(8, 8))
ax1 = fig.add_axes([0.09, 0.1, 0.2, 0.6])
Y = sch.linkage(condensedD, method='centroid')
Z1 = sch.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3, 0.71, 0.6, 0.2])
Y = sch.linkage(condensedD, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3, 0.1, 0.6, 0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=plt.cm.YlGnBu)
axmatrix.set_xticks([])  # remove axis labels
axmatrix.set_yticks([])  # remove axis labels

# Plot colorbar.
axcolor = fig.add_axes([0.91, 0.1, 0.02, 0.6])
plt.colorbar(im, cax=axcolor)
plt.show()
fig.savefig('dendrogram.png')

在此处输入图像描述


编辑:对于不同的颜色,调整 cmap< imshow 中的 /code> 属性。有关示例,请参阅 scipy/matplotlib 文档。该页面还描述了如何创建您自己的颜色图。为了方便起见,我建议使用预先存在的颜色图。在我的示例中,我使用了 YlGnBu


编辑: add_axes (请参阅此处的文档)接受列表或元组:(left,bottom,width,height)。例如,(0.5,0,0.5,1) 在图形的右半部分添加一个Axes(0,0.5,1,0.5) 在图形的上半部分添加一个 Axes

大多数人可能会为了方便而使用 add_subplot。我喜欢 add_axes 的控制功能。

要删除边框,请使用 add_axes([left,bottom,width,height], frame_on=False)。 请参阅此处的示例。

The question does not define matrix very well: "matrix of values", "matrix of data". I assume that you mean a distance matrix. In other words, element D_ij in the symmetric nonnegative N-by-N distance matrix D denotes the distance between two feature vectors, x_i and x_j. Is that correct?

If so, then try this (edited June 13, 2010, to reflect two different dendrograms).

Tested in python 3.10 and matplotlib 3.5.1

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from scipy.spatial.distance import squareform

# Generate random features and distance matrix.
np.random.seed(200)  # for reproducible data
x = np.random.rand(40)
D = np.zeros([40, 40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

condensedD = squareform(D)

# Compute and plot first dendrogram.
fig = plt.figure(figsize=(8, 8))
ax1 = fig.add_axes([0.09, 0.1, 0.2, 0.6])
Y = sch.linkage(condensedD, method='centroid')
Z1 = sch.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3, 0.71, 0.6, 0.2])
Y = sch.linkage(condensedD, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3, 0.1, 0.6, 0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=plt.cm.YlGnBu)
axmatrix.set_xticks([])  # remove axis labels
axmatrix.set_yticks([])  # remove axis labels

# Plot colorbar.
axcolor = fig.add_axes([0.91, 0.1, 0.02, 0.6])
plt.colorbar(im, cax=axcolor)
plt.show()
fig.savefig('dendrogram.png')

enter image description here


Edit: For different colors, adjust the cmap attribute in imshow. See the scipy/matplotlib docs for examples. That page also describes how to create your own colormap. For convenience, I recommend using a preexisting colormap. In my example, I used YlGnBu.


Edit: add_axes (see documentation here) accepts a list or tuple: (left, bottom, width, height). For example, (0.5,0,0.5,1) adds an Axes on the right half of the figure. (0,0.5,1,0.5) adds an Axes on the top half of the figure.

Most people probably use add_subplot for its convenience. I like add_axes for its control.

To remove the border, use add_axes([left,bottom,width,height], frame_on=False). See example here.

泅人 2024-09-11 01:16:55

如果除了矩阵和树状图之外,还需要显示元素的标签,可以使用以下代码,该代码显示所有标签,旋转 x 标签并更改字体大小以避免在 x 轴上重叠。它需要移动颜色条以便为 y 标签留出空间:

axmatrix.set_xticks(range(40))
axmatrix.set_xticklabels(idx1, minor=False)
axmatrix.xaxis.set_label_position('bottom')
axmatrix.xaxis.tick_bottom()

pylab.xticks(rotation=-90, fontsize=8)

axmatrix.set_yticks(range(40))
axmatrix.set_yticklabels(idx2, minor=False)
axmatrix.yaxis.set_label_position('right')
axmatrix.yaxis.tick_right()

axcolor = fig.add_axes([0.94,0.1,0.02,0.6])

获得的结果是这样的(使用不同的颜色图):

结果得到的是这样的:

If in addition to the matrix and dendrogram it is required to show the labels of the elements, the following code can be used, that shows all the labels rotating the x labels and changing the font size to avoid overlapping on the x axis. It requires moving the colorbar to have space for the y labels:

axmatrix.set_xticks(range(40))
axmatrix.set_xticklabels(idx1, minor=False)
axmatrix.xaxis.set_label_position('bottom')
axmatrix.xaxis.tick_bottom()

pylab.xticks(rotation=-90, fontsize=8)

axmatrix.set_yticks(range(40))
axmatrix.set_yticklabels(idx2, minor=False)
axmatrix.yaxis.set_label_position('right')
axmatrix.yaxis.tick_right()

axcolor = fig.add_axes([0.94,0.1,0.02,0.6])

The result obtained is this (with a different color map):

The result obtained is this:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文