在 ggplot2 中创建散点图矩阵(pairs() 等效项)

发布于 2024-09-24 04:42:21 字数 133 浏览 1 评论 0原文

是否可以使用 ggplot2 绘制散点图矩阵,使用 ggplot2 的出色功能(例如将其他因素映射到颜色、形状等并添加平滑器)?

我正在考虑类似于 base 函数 pairs 的东西。

Is it possible to plot a matrix of scatter plots with ggplot2, using ggplot's nice features like mapping additional factors to color, shape etc. and adding smoother?

I am thinking about something similar to the base function pairs.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

不一样的天空 2024-10-01 04:42:21

我一直想这样做,但绘图矩阵很糟糕。 Hadley 建议使用GGally 包 代替。它有一个函数 ggpairs ,它是大大改进了对图(允许您在数据框中使用非连续变量)。它根据变量类型在每个方块中绘制不同的图:

library(GGally)
ggpairs(iris, aes(colour = Species, alpha = 0.4))

在此处输入图像描述

I keep wanting to do this, but plotmatrix is crap. Hadley recommends using the GGally package instead. It has a function, ggpairs that is a vastly improved pairs plot (lets you use non-continuous variables in your data frames). It plots different plots in each square, depending on the variable types:

library(GGally)
ggpairs(iris, aes(colour = Species, alpha = 0.4))

enter image description here

泅渡 2024-10-01 04:42:21

您可能想尝试plotmatrix:

  library(ggplot2)
  data(mtcars)
  plotmatrix(mtcars[,1:3])

对我来说,mpg(mtcars 中的第一列)不应该成为一个因素。我还没有检查过,但没有理由说它应该是一个。不过我得到了一个散点图:)


注意:为了将来的参考,plotmatrix()函数已被ggpairs()函数取代@naught101 在下面的另一个回复中建议使用 GGally 包。

You might want to try plotmatrix:

  library(ggplot2)
  data(mtcars)
  plotmatrix(mtcars[,1:3])

to me mpg (first column in mtcars) should not be a factor. I haven't checked it, but there's no reason why it should be one. However I get a scatter plot :)


Note: For future reference, the plotmatrix() function has been replaced by the ggpairs() function from the GGally package as @naught101 suggests in another response below to this question.

最好是你 2024-10-01 04:42:21

如果想要获得一个 ggplot 对象(而不是像 ggpairs() 那样的 ggmatrix ),解决方案是融化数据两次,然后使用分面进行 ggplot。如果提供了 scales = 'free' 参数,在限制绘制区域方面,facet_wrap 会比 facet_grid 更好。

require(ggplot2) 
require(dplyr)
require(tidyr)

gatherpairs <- function(data, ..., 
                        xkey = '.xkey', xvalue = '.xvalue',
                        ykey = '.ykey', yvalue = '.yvalue',
                        na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
  vars <- quos(...)
  xkey <- enquo(xkey)
  xvalue <- enquo(xvalue)
  ykey <- enquo(ykey)
  yvalue <- enquo(yvalue)

  data %>% {
    cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
                 na.rm = na.rm, convert = convert, factor_key = factor_key),
          select(., !!!vars)) 
  } %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
               na.rm = na.rm, convert = convert, factor_key = factor_key)
}

iris %>% 
  gatherpairs(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% {
  ggplot(., aes(x = .xvalue, y = .yvalue, color = Species)) +
      geom_point() + 
      geom_smooth(method = 'lm') +
      facet_wrap(.xkey ~ .ykey, ncol = length(unique(.$.ykey)), scales = 'free', labeller = label_both) +
      scale_color_brewer(type = 'qual')
}

输入图像描述这里

If one wants to obtain a ggplot object (not ggmatrix as in case of ggpairs()), the solution is to melt the data twice, then ggplot with facetting. facet_wrap would be better than facet_grid in limiting the plotted area, given the scales = 'free' parameter is supplied.

require(ggplot2) 
require(dplyr)
require(tidyr)

gatherpairs <- function(data, ..., 
                        xkey = '.xkey', xvalue = '.xvalue',
                        ykey = '.ykey', yvalue = '.yvalue',
                        na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
  vars <- quos(...)
  xkey <- enquo(xkey)
  xvalue <- enquo(xvalue)
  ykey <- enquo(ykey)
  yvalue <- enquo(yvalue)

  data %>% {
    cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
                 na.rm = na.rm, convert = convert, factor_key = factor_key),
          select(., !!!vars)) 
  } %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
               na.rm = na.rm, convert = convert, factor_key = factor_key)
}

iris %>% 
  gatherpairs(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% {
  ggplot(., aes(x = .xvalue, y = .yvalue, color = Species)) +
      geom_point() + 
      geom_smooth(method = 'lm') +
      facet_wrap(.xkey ~ .ykey, ncol = length(unique(.$.ykey)), scales = 'free', labeller = label_both) +
      scale_color_brewer(type = 'qual')
}

enter image description here

笑叹一世浮沉 2024-10-01 04:42:21

尝试 scatterPlotMatrix。它非常灵活,可以生成漂亮的交互式图表。

library(scatterPlotMatrix)
scatterPlotMatrix(iris, zAxisDim = "Species")

输入图片此处描述

Try scatterPlotMatrix. It's very flexible and produces nice looking interactive charts.

library(scatterPlotMatrix)
scatterPlotMatrix(iris, zAxisDim = "Species")

enter image description here

鹊巢 2024-10-01 04:42:21

稍后,我附上一个不使用 dplyr 的替代方案:

library("ggplot2")
library("reshape")

# what vars to plot
vars_to_plot <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")

# melt the table
melted <- melt(iris[, c("Species", vars_to_plot)])

# define empty vector
final_all <- vector()

# for each variable of interest
for (a_var in vars_to_plot) {

    # get its actual values
    temp <- iris[, a_var]
    
    # replicate them for each variable
    temp_col <- rep(temp, length(unique(melted$variable)))
    
    # rbind them
    final_all <- rbind.data.frame(final_all, cbind(melted, var=rep(a_var, length(temp_col)), temp_col))
    
    # remove the variable that was just added to the final table
    melted <- melted[-which(melted$variable==a_var), ]
}

# remove duplicate comparisons, if needed
final_no_dup <- final_all[-which(final_all$variable==final_all$var), ]

# plot
ggplot_pairs <- ggplot(final_no_dup, aes(x=value, y=temp_col, fill=Species)) +
    geom_point(shape=21, size=5, color="black", stroke=3) +
    facet_wrap(variable~var, scales='free', labeller=label_wrap_gen(multi_line=FALSE)) +
    xlab("") +
    ylab("") +
    guides(fill=guide_legend(override.aes=list(shape=21))) +
    theme_bw()

plot(ggplot_pairs)

a bit later, i attach an alternative that is not using dplyr:

library("ggplot2")
library("reshape")

# what vars to plot
vars_to_plot <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")

# melt the table
melted <- melt(iris[, c("Species", vars_to_plot)])

# define empty vector
final_all <- vector()

# for each variable of interest
for (a_var in vars_to_plot) {

    # get its actual values
    temp <- iris[, a_var]
    
    # replicate them for each variable
    temp_col <- rep(temp, length(unique(melted$variable)))
    
    # rbind them
    final_all <- rbind.data.frame(final_all, cbind(melted, var=rep(a_var, length(temp_col)), temp_col))
    
    # remove the variable that was just added to the final table
    melted <- melted[-which(melted$variable==a_var), ]
}

# remove duplicate comparisons, if needed
final_no_dup <- final_all[-which(final_all$variable==final_all$var), ]

# plot
ggplot_pairs <- ggplot(final_no_dup, aes(x=value, y=temp_col, fill=Species)) +
    geom_point(shape=21, size=5, color="black", stroke=3) +
    facet_wrap(variable~var, scales='free', labeller=label_wrap_gen(multi_line=FALSE)) +
    xlab("") +
    ylab("") +
    guides(fill=guide_legend(override.aes=list(shape=21))) +
    theme_bw()

plot(ggplot_pairs)
时光清浅 2024-10-01 04:42:21

如果您只想使用 ggplot2 进行绘图,这里有一个类似于 @mjktfw 提出的解决方案,但代码更短,也许更清晰:

library(tidyr)
library(dplyr)
library(ggplot2)
data(iris)

# Create id so that observations can be re-identified
iris <- iris |> 
  mutate(id = row_number()) 

# Prepare data to be plotted on the x axis
x_vars <- pivot_longer(data = iris,
             cols = Sepal.Length:Petal.Width,
             names_to = "variable_x",
             values_to = "x")

# Prepare data to be plotted on the y axis  
y_vars <- pivot_longer(data = iris,
                       cols = Sepal.Length:Petal.Width,
                       names_to = "variable_y",
                       values_to = "y") 

# Join data for x and y axes and make plot
full_join(x_vars, y_vars, 
          by = c("id", "Species"),
          relationship = "many-to-many") |>
  ggplot() + 
  aes(x = x, y = y, color = Species) +
  geom_point() +
  facet_grid(c("variable_x", "variable_y")) 

在此处输入图像描述

If you only want to use ggplot2 for plotting, here is a solution similar to the one proposed by @mjktfw but with a shorter, perhaps cleaner code:

library(tidyr)
library(dplyr)
library(ggplot2)
data(iris)

# Create id so that observations can be re-identified
iris <- iris |> 
  mutate(id = row_number()) 

# Prepare data to be plotted on the x axis
x_vars <- pivot_longer(data = iris,
             cols = Sepal.Length:Petal.Width,
             names_to = "variable_x",
             values_to = "x")

# Prepare data to be plotted on the y axis  
y_vars <- pivot_longer(data = iris,
                       cols = Sepal.Length:Petal.Width,
                       names_to = "variable_y",
                       values_to = "y") 

# Join data for x and y axes and make plot
full_join(x_vars, y_vars, 
          by = c("id", "Species"),
          relationship = "many-to-many") |>
  ggplot() + 
  aes(x = x, y = y, color = Species) +
  geom_point() +
  facet_grid(c("variable_x", "variable_y")) 

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文