在分类变量图表中显示百分比而不是计数

发布于 2024-09-19 00:18:19 字数 1624 浏览 4 评论 0原文

我正在绘制一个分类变量,而不是显示每个类别值的计数。

我正在寻找一种方法让 ggplot 来显示该类别中值的百分比。当然,可以使用计算出的百分比创建另一个变量并绘制该变量,但我必须执行数十次,并且我希望通过一个命令来实现这一目标。

我正在尝试类似的东西

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

,但我一定使用不正确,因为我遇到了错误。

为了轻松地重现设置,这里有一个简化的示例:

mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

在实际情况中,我可能会使用 ggplot 而不是 qplot,但使用 stat_bin 仍然让我困惑。

我也尝试过这四种方法:

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

但所有 4 种方法都给出了:

错误:ggplot2不知道如何处理类因子的数据

对于以下简单情况也会出现相同的错误,

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

因此很明显,这与 ggplot 如何与单个向量交互有关。我摸不着头脑,谷歌搜索该错误给出了一个结果

I'm plotting a categorical variable and instead of showing the counts for each category value.

I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.

I was experimenting with something like

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

but I must be using it incorrectly, as I got errors.

To easily reproduce the setup, here's a simplified example:

mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

In the real case, I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me.

I've also tried these four approaches:

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

but all 4 give:

Error: ggplot2 doesn't know how to deal with data of class factor

The same error appears for the simple case of

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

蓝天 2024-09-26 00:18:20

自从这个问题得到解答以来,ggplot 语法发生了一些有意义的变化。总结上面评论中的讨论:

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

这是一个使用 mtcars 的可重现示例:

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels = percent) ## version 3.0.0

在此处输入图像描述

这个问题目前是 Google 上“ggplot 计数与百分比直方图”的排名第一的问题,所以希望这有助于提取当前包含在已接受答案的评论中的所有信息。

备注:如果 hp 未设置为因子,ggplot 将返回:

在此处输入图像描述

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels = percent) ## version 3.0.0

enter image description here

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

enter image description here

掐死时间 2024-09-26 00:18:20

此修改后的代码应该可以工作

p = ggplot(mydataf, aes(x = foo)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    scale_y_continuous(formatter = 'percent')

如果您的数据具有 NA 并且您不希望将它们包含在图中,则将 na.omit(mydataf) 作为参数传递给 ggplot,

。希望这有帮助。

this modified code should work

p = ggplot(mydataf, aes(x = foo)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    scale_y_continuous(formatter = 'percent')

if your data has NAs and you dont want them to be included in the plot, pass na.omit(mydataf) as the argument to ggplot.

hope this helps.

踏月而来 2024-09-26 00:18:20

对于 ggplot2 版本 2.1.0 来说是

+ scale_y_continuous(labels = scales::percent)

With ggplot2 version 2.1.0 it is

+ scale_y_continuous(labels = scales::percent)
尴尬癌患者 2024-09-26 00:18:20

截至 2017 年 3 月,对于 ggplot2 2.2.1,我认为最好的解决方案在 Hadley Wickham 的 R for data science 书中进行了解释:

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))

stat_count 计算两个变量:count<默认情况下使用 /code>,但您可以选择使用显示比例的 prop

As of March 2017, with ggplot2 2.2.1 I think the best solution is explained in Hadley Wickham's R for data science book:

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))

stat_count computes two variables: count is used by default, but you can choose to use prop which shows proportions.

仅此而已 2024-09-26 00:18:20

如果您希望在条形图上标记 y 轴上的百分比

library(ggplot2)
library(scales)
ggplot(mtcars, aes(x = as.factor(am))) +
  geom_bar(aes(y = (..count..)/sum(..count..))) +
  geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
  scale_y_continuous(labels = percent) +
  labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission")

在此处输入图像描述

添加条形标签时,您可能希望省略 y 轴以获得更清晰的图表,方法是添加到末尾:

  theme(
        axis.text.y=element_blank(), axis.ticks=element_blank(),
        axis.title.y=element_blank()
  )

在此处输入图像描述

If you want percentages on the y-axis and labeled on the bars:

library(ggplot2)
library(scales)
ggplot(mtcars, aes(x = as.factor(am))) +
  geom_bar(aes(y = (..count..)/sum(..count..))) +
  geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
  scale_y_continuous(labels = percent) +
  labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission")

enter image description here

When adding the bar labels, you may wish to omit the y-axis for a cleaner chart, by adding to the end:

  theme(
        axis.text.y=element_blank(), axis.ticks=element_blank(),
        axis.title.y=element_blank()
  )

enter image description here

哥,最终变帅啦 2024-09-26 00:18:20

版本 3.3 ggplot2,我们可以使用方便的 after_stat() 函数。

我们可以做一些类似于@Andrew的答案,但不使用 .. 语法:

# original example data
mydata <- c("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc")

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

You可以在geom_stat_函数的文档中找到所有可用的“计算变量”。例如,对于 geom_bar(),您可以访问 countprop 变量。 (请参阅计算变量文档 .)

关于 NULL 值的一条评论:当您创建向量时,它们将被忽略(即,您最终得到长度为 9 的向量,而不是 11)。 NA (ggplot2 会将 NA 放在绘图的右端):

# use NA instead of NULL
mydata <- c("aa", "bb", NA, "bb", "cc", "aa", "aa", "aa", "ee", NA, "cc")
length(mydata)
#> [1] 11

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

如果您确实想跟踪丢失的数据,则必须使用sstatic.net/sQ3Lv.png" alt="">

reprex 包于 2021 年 2 月 9 日创建 (v1.0.0)

(请注意,使用 chrfct 数据不会对您的示例产生影响。)

Since version 3.3 of ggplot2, we have access to the convenient after_stat() function.

We can do something similar to @Andrew's answer, but without using the .. syntax:

# original example data
mydata <- c("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc")

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

You can find all the "computed variables" available to use in the documentation of the geom_ and stat_ functions. For example, for geom_bar(), you can access the count and prop variables. (See the documentation for computed variables.)

One comment about your NULL values: they are ignored when you create the vector (i.e. you end up with a vector of length 9, not 11). If you really want to keep track of missing data, you will have to use NA instead (ggplot2 will put NAs at the right end of the plot):

# use NA instead of NULL
mydata <- c("aa", "bb", NA, "bb", "cc", "aa", "aa", "aa", "ee", NA, "cc")
length(mydata)
#> [1] 11

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

Created on 2021-02-09 by the reprex package (v1.0.0)

(Note that using chr or fct data will not make a difference for your example.)

尹雨沫 2024-09-26 00:18:20

请注意,如果您的变量是连续的,则必须使用 geom_histogram(),因为该函数将按“bins”对变量进行分组。

df <- data.frame(V1 = rnorm(100))

ggplot(df, aes(x = V1)) +  
  geom_histogram(aes(y = 100*(..count..)/sum(..count..))) 

# if you use geom_bar(), with factor(V1), each value of V1 will be treated as a
# different category. In this case this does not make sense, as the variable is 
# really continuous. With the hp variable of the mtcars (see previous answer), it 
# worked well since hp was not really continuous (check unique(mtcars$hp)), and one 
# can want to see each value of this variable, and not to group it in bins.
ggplot(df, aes(x = factor(V1))) +  
  geom_bar(aes(y = (..count..)/sum(..count..))) 

Note that if your variable is continuous, you will have to use geom_histogram(), as the function will group the variable by "bins".

df <- data.frame(V1 = rnorm(100))

ggplot(df, aes(x = V1)) +  
  geom_histogram(aes(y = 100*(..count..)/sum(..count..))) 

# if you use geom_bar(), with factor(V1), each value of V1 will be treated as a
# different category. In this case this does not make sense, as the variable is 
# really continuous. With the hp variable of the mtcars (see previous answer), it 
# worked well since hp was not really continuous (check unique(mtcars$hp)), and one 
# can want to see each value of this variable, and not to group it in bins.
ggplot(df, aes(x = factor(V1))) +  
  geom_bar(aes(y = (..count..)/sum(..count..))) 
破晓 2024-09-26 00:18:20

这是多面数据的解决方法。 (@Andrew 接受的答案在这种情况下不起作用。)这个想法是使用 dplyr 计算百分比值,然后使用 geom_col 创建绘图。

library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)

binwidth <- 30

mtcars.stats <- mtcars %>%
  group_by(cyl) %>%
  mutate(bin = cut(hp, breaks=seq(0,400, binwidth), 
               labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
         n = n()) %>%
  group_by(cyl, bin) %>%
  summarise(p = n()/n[1]) %>%
  ungroup() %>%
  mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) +  
  geom_col() + 
  scale_y_continuous(labels = percent) +
  facet_grid(cyl~.)

这是情节:

在此处输入图像描述

Here is a workaround for faceted data. (The accepted answer by @Andrew does not work in this case.) The idea is to calculate the percentage value using dplyr and then to use geom_col to create the plot.

library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)

binwidth <- 30

mtcars.stats <- mtcars %>%
  group_by(cyl) %>%
  mutate(bin = cut(hp, breaks=seq(0,400, binwidth), 
               labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
         n = n()) %>%
  group_by(cyl, bin) %>%
  summarise(p = n()/n[1]) %>%
  ungroup() %>%
  mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) +  
  geom_col() + 
  scale_y_continuous(labels = percent) +
  facet_grid(cyl~.)

This is the plot:

enter image description here

困倦 2024-09-26 00:18:20

如果您想要百分比标签,但 y 轴上有实际 N,请尝试以下操作:

    library(scales)
perbar=function(xx){
      q=ggplot(data=data.frame(xx),aes(x=xx))+
      geom_bar(aes(y = (..count..)),fill="orange")
       q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen") 
      q
    }
    perbar(mtcars$disp)

If you want percentage labels but actual Ns on the y axis, try this:

    library(scales)
perbar=function(xx){
      q=ggplot(data=data.frame(xx),aes(x=xx))+
      geom_bar(aes(y = (..count..)),fill="orange")
       q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen") 
      q
    }
    perbar(mtcars$disp)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文