R 中使用 ggplot 函数的热图 - 如何对行进行聚类?

发布于 2024-12-12 04:56:16 字数 2584 浏览 0 评论 0原文

我目前正在使用 ggplot 函数在 R 中生成热图。在下面的代码中..我首先将数据读入数据帧,删除任何重复的行,分解时间戳字段,融合数据帧(根据“时间戳”),将所有变量缩放到 0 和 1 之间,然后绘制热图。

在生成的热图中,时间绘制在 x 轴上,每个 iostat-sda 变量(请参阅下面的示例数据)沿着 y 轴绘制。注意:如果您想尝试 R 代码,您可以将下面的示例数据粘贴到名为 iostat-sda.csv 的文件中。

但是我确实需要能够对这个热图中的行进行聚类...有人知道如何使用 ggplot 函数来实现这一点吗?

任何帮助将非常感激!

############################## The code
library(ggplot2)
fileToAnalyse_f  <- read.csv(file="iostat-sda.csv",head=TRUE,sep=",")
fileToAnalyse <- subset(fileToAnalyse, !duplicated(timestamp))
fileToAnalyse[,1]<-factor(fileToAnalyse[,1])

fileToAnalyse.m <- melt(fileToAnalyse, id=1)
fileToAnalyse.s <- ddply(fileToAnalyse.m, .(variable), transform, rescale = rescale(value) ) #scales each variable between 0 and 1 

base_size <- 9
ggplot(fileToAnalyse.s, aes(timestamp, variable)) + geom_tile(aes(fill = rescale), colour = "black") + scale_fill_gradient(low = "black", high = "white") + theme_grey(base_size = base_size) + labs(x = "Time", y = "") + opts(title = paste("Heatmap"),legend.position = "right", axis.text.x = theme_blank(), axis.ticks = theme_blank()) + scale_y_discrete(expand = c(0, 0)) + scale_x_discrete(expand = c(0, 0))

########################## Sample data from iostat-sda.csv
timestamp,DSKRRQM,DSKWRQM,DSKR,DSKW,DSKRMB,DSKWMB,DSKARQS,DSKAQUS,DSKAWAIT,DSKSVCTM,DSKUtil
1319204905,0.33,0.98,10.35,2.37,0.72,0.02,120.00,0.01,0.40,0.31,0.39
1319204906,1.00,4841.00,682.00,489.00,60.09,40.68,176.23,2.91,2.42,0.50,59.00
1319204907,0.00,1600.00,293.00,192.00,32.64,13.89,196.45,5.48,10.76,2.04,99.00 1319204908,0.00,3309.00,1807.00,304.00,217.39,26.82,236.93,4.84,2.41,0.45,96.00
1319204909,0.00,5110.00,93.00,427.00,0.72,43.31,173.43,4.43,8.67,1.90,99.00
1319204910,0.00,6345.00,115.00,496.00,0.96,52.25,178.34,4.00,6.32,1.62,99.00
1319204911,0.00,6793.00,129.00,666.00,1.33,57.22,150.83,4.74,6.16,1.26,100.00
1319204912,0.00,6444.00,115.00,500.00,0.93,53.06,179.77,4.20,6.83,1.58,97.00
1319204913,0.00,1923.00,835.00,215.00,78.45,16.68,185.55,4.81,4.58,0.91,96.00
1319204914,0.00,0.00,788.00,0.00,83.51,0.00,217.04,0.45,0.57,0.25,20.00
1319204915,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1319204916,0.00,4.00,2.00,4.00,0.01,0.04,17.67,0.00,0.00,0.00,0.00
1319204917,0.00,8.00,4.00,8.00,0.02,0.09,17.83,0.00,0.00,0.00,0.00
1319204918,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1319204919,0.00,2.00,113.00,4.00,11.96,0.03,209.93,0.06,0.51,0.43,5.00
1319204920,0.00,59.00,147.00,54.00,11.15,0.63,120.02,0.04,0.20,0.15,3.00
1319204921,1.00,19.00,57.00,18.00,4.68,0.20,133.47,0.07,0.93,0.67,5.00

I am currently generating heatmaps in R using the ggplot function. In the code below.. I first read the data into a dataframe, remove any duplicate rows, factorise timestamp field, melt the dataframe (according to 'timestamp'), scale all variable between 0 and 1, then plot the heatmap.

In the resulting heatmap, time is plotted on the x axis and each iostat-sda variable (see sample data below) is plotted along the y axis. Note: If you want to try out the R code – you can paste the sample data below into a file called iostat-sda.csv.

however I really need to be able cluster the rows within this heatmap... anyone know how this can be achieved using the ggplot function?

Any help would be very much appreciated!!

############################## The code
library(ggplot2)
fileToAnalyse_f  <- read.csv(file="iostat-sda.csv",head=TRUE,sep=",")
fileToAnalyse <- subset(fileToAnalyse, !duplicated(timestamp))
fileToAnalyse[,1]<-factor(fileToAnalyse[,1])

fileToAnalyse.m <- melt(fileToAnalyse, id=1)
fileToAnalyse.s <- ddply(fileToAnalyse.m, .(variable), transform, rescale = rescale(value) ) #scales each variable between 0 and 1 

base_size <- 9
ggplot(fileToAnalyse.s, aes(timestamp, variable)) + geom_tile(aes(fill = rescale), colour = "black") + scale_fill_gradient(low = "black", high = "white") + theme_grey(base_size = base_size) + labs(x = "Time", y = "") + opts(title = paste("Heatmap"),legend.position = "right", axis.text.x = theme_blank(), axis.ticks = theme_blank()) + scale_y_discrete(expand = c(0, 0)) + scale_x_discrete(expand = c(0, 0))

########################## Sample data from iostat-sda.csv
timestamp,DSKRRQM,DSKWRQM,DSKR,DSKW,DSKRMB,DSKWMB,DSKARQS,DSKAQUS,DSKAWAIT,DSKSVCTM,DSKUtil
1319204905,0.33,0.98,10.35,2.37,0.72,0.02,120.00,0.01,0.40,0.31,0.39
1319204906,1.00,4841.00,682.00,489.00,60.09,40.68,176.23,2.91,2.42,0.50,59.00
1319204907,0.00,1600.00,293.00,192.00,32.64,13.89,196.45,5.48,10.76,2.04,99.00 1319204908,0.00,3309.00,1807.00,304.00,217.39,26.82,236.93,4.84,2.41,0.45,96.00
1319204909,0.00,5110.00,93.00,427.00,0.72,43.31,173.43,4.43,8.67,1.90,99.00
1319204910,0.00,6345.00,115.00,496.00,0.96,52.25,178.34,4.00,6.32,1.62,99.00
1319204911,0.00,6793.00,129.00,666.00,1.33,57.22,150.83,4.74,6.16,1.26,100.00
1319204912,0.00,6444.00,115.00,500.00,0.93,53.06,179.77,4.20,6.83,1.58,97.00
1319204913,0.00,1923.00,835.00,215.00,78.45,16.68,185.55,4.81,4.58,0.91,96.00
1319204914,0.00,0.00,788.00,0.00,83.51,0.00,217.04,0.45,0.57,0.25,20.00
1319204915,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1319204916,0.00,4.00,2.00,4.00,0.01,0.04,17.67,0.00,0.00,0.00,0.00
1319204917,0.00,8.00,4.00,8.00,0.02,0.09,17.83,0.00,0.00,0.00,0.00
1319204918,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1319204919,0.00,2.00,113.00,4.00,11.96,0.03,209.93,0.06,0.51,0.43,5.00
1319204920,0.00,59.00,147.00,54.00,11.15,0.63,120.02,0.04,0.20,0.15,3.00
1319204921,1.00,19.00,57.00,18.00,4.68,0.20,133.47,0.07,0.93,0.67,5.00

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

静待花开 2024-12-19 04:56:16

有一个名为 NeatMap 的好包,它简化了 ggplot2 中热图的生成。一些行聚类方法包括多维缩放、PCA 或层次聚类。需要注意的是:

  1. make.heatmap1 的数据必须采用宽格式
  2. 数据必须是矩阵,而不是数据框
  3. 绘制的图之前将行名称分配给宽格式矩阵

在生成我已经 稍微更改了代码以避免将变量命名为与基本函数相同的名称(即重新缩放)

fileToAnalyse.s <- ddply(fileToAnalyse.m, .(variable), transform, rescale.x = rescale(value) )     #scales each variable between 0 and 1 

fileToAnalyse.w <- dcast(fileToAnalyse.s, timestamp ~ variable, value_var="rescale.x") 
rownames(fileToAnalyse.w) <- as.character(fileToAnalyse.w[, 1]) 

ggheatmap <- make.heatmap1(as.matrix(fileToAnalyse.w[, -1]), row.method = "complete.linkage", row.metric="euclidean", column.cluster.method  ="none", row.labels = rownames(fileToAnalyse.w))
 +scale_fill_gradient(low = "black", high = "white") + labs(x = "Time", y = "") + opts(title = paste("Heatmap")

There is a nice package called NeatMap which simplifies generating heatmaps in ggplot2. Some of the row clustering methods include Multidimensional Scaling, PCA, or hierarchical clustering. Things to watch out for are:

  1. Data to make.heatmap1 has to be in wide format
  2. Data has to be a matrix, not a dataframe
  3. Assign rownames to the wide-format matrix before generating the plot

I've changed your code slightly to avoid naming variables the same as base functions (i.e. rescale)

fileToAnalyse.s <- ddply(fileToAnalyse.m, .(variable), transform, rescale.x = rescale(value) )     #scales each variable between 0 and 1 

fileToAnalyse.w <- dcast(fileToAnalyse.s, timestamp ~ variable, value_var="rescale.x") 
rownames(fileToAnalyse.w) <- as.character(fileToAnalyse.w[, 1]) 

ggheatmap <- make.heatmap1(as.matrix(fileToAnalyse.w[, -1]), row.method = "complete.linkage", row.metric="euclidean", column.cluster.method  ="none", row.labels = rownames(fileToAnalyse.w))
 +scale_fill_gradient(low = "black", high = "white") + labs(x = "Time", y = "") + opts(title = paste("Heatmap")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文