使用皮尔逊距离的微阵列数据热图
我一直在尝试在 R 中为一些微阵列数据生成热图,并且在很大程度上已经成功地根据在线指令生成了一个热图,但它并没有完全达到我想要的效果。我想要的是基于皮尔逊距离而不是欧几里德距离来聚类数据,但我遇到了一些困难。
使用 heatmap2(来自 gplots 包),我使用以下代码来制作初始热图:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue") [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))
Test402 是一个包含 402 行(基因)和 31 列(患者)的矩阵,data.test.factors 是每个结果组的指标患者属于.使用 hclustfun 在这里工作得很好,并且热图似乎对方法和整体工作的变化做出了响应。问题是,聚类距离都是欧氏距离,我想将其更改为皮尔逊距离。所以我尝试以下操作:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )
上述命令失败。这是因为 Test402 需要是一个方阵。所以看看一些额外的建议我尝试了以下方法:
cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )
这有效,但问题在于。热图现在仅显示相关性,而不是 TEST402 中的原始表达式值。这不是我想要的!我想要这个,而且我只希望树形图以不同的方式聚类,我不想更改热图中实际表示的数据!这可能吗?
I have been trying to generate a heatmap in R for some microarray data and for the most part have been successful in producing one, based on online instruction, but it does not do exactly what I want. What I would like is to cluster data based on Pearson distance, rather than euclidean distance, but I have run into some difficulties.
Using heatmap2 (from the gplots package) I use the following code to make my initial heat map:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue") [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))
Test402 is a matrix with 402 rows (genes) and 31 columns (patients), and data.test.factors are indicators of the outcome group each patient belongs to. Using hclustfun works fine here and the heatmap seems to be responsive to change in method and overall works. The problem is, the clustering distance is all Euclidean distance, I would like to change that to Pearson distance. So I attempt the following:
heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )
the above command fails. That is because Test402 needs to be a square matrix. So looking at some additional advice I tried the following:
cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )
That works, BUT here is the problem. The heatmap, rather than having the original expression values in TEST402, now only displays the correlations. This is NOTwhat I want! I want this, and I only want the dendrogram to cluster differently, I don't want to change what data is actually represented in the heatmap! Is this possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧...我认为您只是对
cor
和dist
的操作方式感到困惑。来自dist
的文档:和
cor
的文档:看到区别了吗?
dist
(和dist
对象,这是heatmap.2
假设它得到的)假设您已经计算了 之间的距离行,而使用cor
时,您实际上是在计算列之间的距离。在距离函数中添加一个简单的转置可以让这个(非方形)示例为我运行:Ok...I think you are simply confused about how
cor
anddist
operate. From the documentation ondist
:And from the documentation on
cor
:See the difference?
dist
(anddist
objects, which is whatheatmap.2
is assuming it's getting) assume that you've calculated the distance between rows, while usingcor
you are essentially calculating the distance between columns. Adding a simple transpose to your distance function allows this (non-square) example to run for me: