使用皮尔逊距离的微阵列数据热图

发布于 2024-11-24 03:39:39 字数 1415 浏览 0 评论 0原文

我一直在尝试在 R 中为一些微阵列数据生成热图,并且在很大程度上已经成功地根据在线指令生成了一个热图,但它并没有完全达到我想要的效果。我想要的是基于皮尔逊距离而不是欧几里德距离来聚类数据,但我遇到了一些困难。

使用 heatmap2(来自 gplots 包),我使用以下代码来制作初始热图:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")   [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))

Test402 是一个包含 402 行(基因)和 31 列(患者)的矩阵,data.test.factors 是每个结果组的指标患者属于.使用 hclustfun 在这里工作得很好,并且热图似乎对方法和整体工作的变化做出了响应。问题是,聚类距离都是欧氏距离,我想将其更改为皮尔逊距离。所以我尝试以下操作:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )

上述命令失败。这是因为 Test402 需要是一个方阵。所以看看一些额外的建议我尝试了以下方法:

cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )

这有效,但问题在于。热图现在仅显示相关性,而不是 TEST402 中的原始表达式值。这不是我想要的!我想要这个,而且我只希望树形图以不同的方式聚类,我不想更改热图中实际表示的数据!这可能吗?

I have been trying to generate a heatmap in R for some microarray data and for the most part have been successful in producing one, based on online instruction, but it does not do exactly what I want. What I would like is to cluster data based on Pearson distance, rather than euclidean distance, but I have run into some difficulties.

Using heatmap2 (from the gplots package) I use the following code to make my initial heat map:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")   [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))

Test402 is a matrix with 402 rows (genes) and 31 columns (patients), and data.test.factors are indicators of the outcome group each patient belongs to. Using hclustfun works fine here and the heatmap seems to be responsive to change in method and overall works. The problem is, the clustering distance is all Euclidean distance, I would like to change that to Pearson distance. So I attempt the following:

heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )

the above command fails. That is because Test402 needs to be a square matrix. So looking at some additional advice I tried the following:

cU = cor(Test402)
heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )

That works, BUT here is the problem. The heatmap, rather than having the original expression values in TEST402, now only displays the correlations. This is NOTwhat I want! I want this, and I only want the dendrogram to cluster differently, I don't want to change what data is actually represented in the heatmap! Is this possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

微凉徒眸意 2024-12-01 03:39:39

好吧...我认为您只是对 cordist 的操作方式感到困惑。来自 dist 的文档:

This function computes and returns the distance matrix computed by using the specified 
    distance measure to compute the distances between the rows of a data matrix.

cor 的文档:

If x and y are matrices then the covariances (or correlations) 
    between the columns of x and the columns of y are computed.

看到区别了吗? dist (和 dist 对象,这是 heatmap.2 假设它得到的)假设您已经计算了 之间的距离行,而使用cor时,您实际上是在计算之间的距离。在距离函数中添加一个简单的转置可以让这个(非方形)示例为我运行:

TEST <- matrix(runif(100),nrow=20)
heatmap.2(t(TEST), trace="none", density="none", 
            scale="row",
            labRow="",
            hclust=function(x) hclust(x,method="complete"),
            distfun=function(x) as.dist((1-cor(t(x)))/2))

Ok...I think you are simply confused about how cor and dist operate. From the documentation on dist:

This function computes and returns the distance matrix computed by using the specified 
    distance measure to compute the distances between the rows of a data matrix.

And from the documentation on cor:

If x and y are matrices then the covariances (or correlations) 
    between the columns of x and the columns of y are computed.

See the difference? dist (and dist objects, which is what heatmap.2 is assuming it's getting) assume that you've calculated the distance between rows, while using cor you are essentially calculating the distance between columns. Adding a simple transpose to your distance function allows this (non-square) example to run for me:

TEST <- matrix(runif(100),nrow=20)
heatmap.2(t(TEST), trace="none", density="none", 
            scale="row",
            labRow="",
            hclust=function(x) hclust(x,method="complete"),
            distfun=function(x) as.dist((1-cor(t(x)))/2))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文