如何将差异矩阵与元数据联系起来,以便我可以在R中绘制带有彩色点的MD?

发布于 2025-02-04 08:57:21 字数 1624 浏览 2 评论 0原文

我有一个数据矩阵作为.csv(从 sourmash ) 。矩阵看起来像这样: matrix

我也有与该矩阵相对应的元数据。它将矩阵中表示的样品分组几种不同的方式。看起来像这样: metadata

我想在基于某些基于某些点的同时绘制MDS关于它们的元数据价值。到目前为止,我已经能够上传矩阵并绘制了点,但是在如何将元数据值链接到矩阵上的过程中丢失了,以便在绘制矩阵时我可以按颜色为矩阵值上色。我知道这可能是一个简单的解决方案,但感谢任何帮助!到目前为止,这就是我所拥有的:

#import matrix and metadata
sm_matrix <- read.csv("path to .csv", header = TRUE, sep = ",")
md <- read.csv("path to .csv", header = TRUE, sep = ",")

#transform for plotting
sm_matrix <- as.matrix(sm_matrix)

#plot
mds <- sm_test %>%
  dist() %>%
cmdscale() %>%
  as_tibble()
colnames(mds) <- c("dim.1", "dim.2")

我也尝试过绘制这个错误

ggscatter(mds, x = "dim.1", y = "dim.2",
          color = md$Location,
          palette = "jco",
          size = 1, 
          ellipse = TRUE,
          ellipse.type = "convex",
          repel = TRUE)

,但是我得到了这个错误:

Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (92): colour
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In if (color %in% names(data) & is.null(add.params$color)) add.params$color <- color :
  the condition has length > 1 and only the first element will be used

谢谢!

山姆

I have a data matrix as a .csv (output from sourmash). The matrix looks something like this: matrix

I also have metadata that corresponds with that matrix. It groups the samples represented in the matrix several different ways. It looks something like this: metadata

I'd like to plot an MDS while coloring certain points based on their metadata value. So far I've been able to upload the matrix and plot the points, but am lost on how to "link" the metadata values to the matrix so that I can color the matrix values by color when they are plotted. I know it's probably a simple fix but would appreciate any help! This is what I have so far:

#import matrix and metadata
sm_matrix <- read.csv("path to .csv", header = TRUE, sep = ",")
md <- read.csv("path to .csv", header = TRUE, sep = ",")

#transform for plotting
sm_matrix <- as.matrix(sm_matrix)

#plot
mds <- sm_test %>%
  dist() %>%
cmdscale() %>%
  as_tibble()
colnames(mds) <- c("dim.1", "dim.2")

I've also tried this to plot

ggscatter(mds, x = "dim.1", y = "dim.2",
          color = md$Location,
          palette = "jco",
          size = 1, 
          ellipse = TRUE,
          ellipse.type = "convex",
          repel = TRUE)

but I get this error:

Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (92): colour
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In if (color %in% names(data) & is.null(add.params$color)) add.params$color <- color :
  the condition has length > 1 and only the first element will be used

Thank you!

Sam

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

埋葬我深情 2025-02-11 08:57:21

这里是一种有效的方法。 GGSCATTER的警告仍然存在,但警告不是错误,这可能是该软件包的问题。

首先,数据是直接在脚本中创建的。这是首选的方式,因为否则人们必须投资额外的工作才能从屏幕截图中键入数据。此外,明确提及二手软件包也是很好的样式。

脚本本身使用两个技巧。首先,用setNames调用as_tibble添加名称。另一个技巧是将字符变量<代码>位置转换为数字,然后将其首先转换为因子,然后将其转换为数字。此外,我将大小提高到4,以使结果更好。

library("dplyr")
library("ggpubr")

sm_matrix  <- matrix(c(1, 0.2, 0.7, 0.2, 1, 0.2, 0.3, 0.2, 1), nrow=3)
rownames(sm_matrix ) <- colnames(sm_matrix) <- c("sample_1", "sample_2", "sample_3")

md <- as.data.frame(matrix(c("sample1", "sample2", "sample3", LETTERS[1:9]), nrow=3))
colnames(md) <- c("SampleID", "Diet", "Location", "Size")

mds <- sm_matrix %>%
  dist() %>%
  cmdscale() %>%
  as_tibble() %>%
  setNames(c("dim.1", "dim.2"))

plot(mds)

ggscatter(mds, x = "dim.1", y = "dim.2",
          color = as.numeric(as.factor(md$Location)),
          palette = "jco",
          size = 4, 
          ellipse = TRUE,
          ellipse.type = "convex",
          repel = TRUE)

Here an approach that works. A warning of ggscatter remains, but a warning is not an error and it may be an issue of the package.

First, the data are created directly in the script. This is the preferred way, because otherwise people have to invest additional work to type the data from the screenshots. In addition, it is also good style to mention the used packages explicitly.

The script itself uses two tricks. First, names are added after calling as_tibble with setNames. The other trick is to convert the character variable Location into a numeric by converting it first to a factor and then a numeric. Furthermore, I increased sizeto 4, to make the result better visible.

library("dplyr")
library("ggpubr")

sm_matrix  <- matrix(c(1, 0.2, 0.7, 0.2, 1, 0.2, 0.3, 0.2, 1), nrow=3)
rownames(sm_matrix ) <- colnames(sm_matrix) <- c("sample_1", "sample_2", "sample_3")

md <- as.data.frame(matrix(c("sample1", "sample2", "sample3", LETTERS[1:9]), nrow=3))
colnames(md) <- c("SampleID", "Diet", "Location", "Size")

mds <- sm_matrix %>%
  dist() %>%
  cmdscale() %>%
  as_tibble() %>%
  setNames(c("dim.1", "dim.2"))

plot(mds)

ggscatter(mds, x = "dim.1", y = "dim.2",
          color = as.numeric(as.factor(md$Location)),
          palette = "jco",
          size = 4, 
          ellipse = TRUE,
          ellipse.type = "convex",
          repel = TRUE)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文