ggplot:如何用facet_wrap显示密度而不是在分组的条形图中计数?

发布于 2025-02-13 02:28:21 字数 1355 浏览 3 评论 0 原文

数据帧由两个因素变量组成: Cls ,具有3个级别, subset 带有2个级别。我想比较 subset 的两组中的每个类别( cls )。我想显示Y轴的百分比。它们应在某些子集组中计算,而不是整个数据集。

library(tidyverse)
data = data.frame(
  x = rnorm(1000),
  cls = factor(c(rep("A", 200), rep("B", 300), rep("C", 500))),
  subset = factor(c(rep("train", 900), rep("test", 100)))
)

这是我试图显示百分比的尝试,但是由于它们是在整个数据集中计算而不是 subset group:

ggplot(data, aes(x = cls, fill = cls)) + geom_bar(aes(y = ..count.. / sum(..count..))) + facet_wrap(~subset)

subset group:subset group:subset 组: subset 组: =“ nofollow noreferrer”> “在此处输入图像描述”

如何修复它?

编辑与接受的答案有关

plot_train_vs_test = function(data, var, subset_colname){
  plot_data = data %>% 
    count(var, eval(subset_colname)) %>% 
    group_by(eval(subset_colname)) %>% 
    mutate(perc = n/sum(n))
  
  ggplot(plot_data, aes(x = var, y = perc, fill = var)) +
    geom_col() +
    scale_y_continuous(labels = scales::label_percent()) +
    facet_wrap(~eval(subset_colname))
}

plot_train_vs_test(data, "cls", "subset")

导致错误。

The dataframe consists of two factor variables: cls with 3 leveles and subset with 2 levels. I want to compare how much of each class (cls) is there in both groups of subset. I want to show percentages on y-axis. They should be computed within certain subset group, not whole dataset.

library(tidyverse)
data = data.frame(
  x = rnorm(1000),
  cls = factor(c(rep("A", 200), rep("B", 300), rep("C", 500))),
  subset = factor(c(rep("train", 900), rep("test", 100)))
)

This was my attempt to show percentages, but it failed because they are computed within whole dataset instead of subset group:

ggplot(data, aes(x = cls, fill = cls)) + geom_bar(aes(y = ..count.. / sum(..count..))) + facet_wrap(~subset)

enter image description here

How can I fix it?

Edit related to the accepted answer:

plot_train_vs_test = function(data, var, subset_colname){
  plot_data = data %>% 
    count(var, eval(subset_colname)) %>% 
    group_by(eval(subset_colname)) %>% 
    mutate(perc = n/sum(n))
  
  ggplot(plot_data, aes(x = var, y = perc, fill = var)) +
    geom_col() +
    scale_y_continuous(labels = scales::label_percent()) +
    facet_wrap(~eval(subset_colname))
}

plot_train_vs_test(data, "cls", "subset")

Results in errors.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

于我来说 2025-02-20 02:28:21

一个选择和简单的修复方法是计算GGPLOT之外的百分比并绘制汇总的数据:

library(ggplot2)
library(dplyr, warn = FALSE)

set.seed(123)

data <- data.frame(
  x = rnorm(1000),
  cls = factor(c(rep("A", 200), rep("B", 300), rep("C", 500))),
  subset = factor(c(rep("train", 900), rep("test", 100)))
)

data_sum <- data %>%
  count(cls, subset) %>%
  group_by(subset) %>%
  mutate(pct = n / sum(n))

ggplot(data_sum, aes(x = cls, y = pct, fill = cls)) +
  geom_col() +
  scale_y_continuous(labels = scales::label_percent()) +
  facet_wrap(~subset)

“

强>将代码放入函数的一种方法可能是如此:

plot_train_vs_test <- function(.data, x, facet) {
  .data_sum <- .data %>%
    count({{ x }}, {{ facet }}) %>%
    group_by({{ facet }}) %>%
    mutate(pct = n / sum(n))

  ggplot(.data_sum, aes(x = {{ x }}, y = pct, fill = {{ x }})) +
    geom_col() +
    scale_y_continuous(labels = scales::label_percent()) +
    facet_wrap(vars({{ facet }}))
}

plot_train_vs_test(data, cls, subset)

“”

有关详细信息的更多信息,尤其是 {{操作员请参阅带有ggplot2 使用GGPLOT2进行编程的最佳实践

One option and easy fix would be to compute the percentages outside of ggplot and plot the summarized data:

library(ggplot2)
library(dplyr, warn = FALSE)

set.seed(123)

data <- data.frame(
  x = rnorm(1000),
  cls = factor(c(rep("A", 200), rep("B", 300), rep("C", 500))),
  subset = factor(c(rep("train", 900), rep("test", 100)))
)

data_sum <- data %>%
  count(cls, subset) %>%
  group_by(subset) %>%
  mutate(pct = n / sum(n))

ggplot(data_sum, aes(x = cls, y = pct, fill = cls)) +
  geom_col() +
  scale_y_continuous(labels = scales::label_percent()) +
  facet_wrap(~subset)

EDIT One approach to put the code in a function may look like so:

plot_train_vs_test <- function(.data, x, facet) {
  .data_sum <- .data %>%
    count({{ x }}, {{ facet }}) %>%
    group_by({{ facet }}) %>%
    mutate(pct = n / sum(n))

  ggplot(.data_sum, aes(x = {{ x }}, y = pct, fill = {{ x }})) +
    geom_col() +
    scale_y_continuous(labels = scales::label_percent()) +
    facet_wrap(vars({{ facet }}))
}

plot_train_vs_test(data, cls, subset)

For more on the details and especially the {{ operator see Programming with dplyr, Programming with ggplot2 and Best practices for programming with ggplot2

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文