ggplot2 条形图中的订单条形图

发布于 2024-10-20 09:30:01 字数 428 浏览 7 评论 0原文

我正在尝试制作一个条形图,其中最大的条形最接近 y 轴,最短的条形最远。所以这有点像我的表格

    Name   Position
1   James  Goalkeeper
2   Frank  Goalkeeper
3   Jean   Defense
4   Steve  Defense
5   John   Defense
6   Tim    Striker

所以我试图建立一个条形图来显示根据位置的球员数量

p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)

,但该图首先显示守门员条形图,然后是防守条形图,最后是前锋条形图。我希望对图表进行排序,以便防守条最接近 y 轴,守门员条,最后是前锋条。 谢谢

I am trying to make a bar graph where the largest bar would be nearest to the y axis and the shortest bar would be furthest. So this is kind of like the Table I have

    Name   Position
1   James  Goalkeeper
2   Frank  Goalkeeper
3   Jean   Defense
4   Steve  Defense
5   John   Defense
6   Tim    Striker

So I am trying to build a bar graph that would show the number of players according to position

p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)

but the graph shows the goalkeeper bar first then the defense, and finally the striker one. I would want the graph to be ordered so that the defense bar is closest to the y axis, the goalkeeper one, and finally the striker one.
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

梦罢 2024-10-27 09:30:02

一个简单的基于 dplyr 的因子重新排序可以解决这个问题:

library(dplyr)

#reorder the table and reset the factor to that ordering
theTable %>%
  group_by(Position) %>%                              # calculate the counts
  summarize(counts = n()) %>%
  arrange(-counts) %>%                                # sort by counts
  mutate(Position = factor(Position, Position)) %>%   # reset factor
  ggplot(aes(x=Position, y=counts)) +                 # plot 
    geom_bar(stat="identity")                         # plot histogram

A simple dplyr based reordering of factors can solve this problem:

library(dplyr)

#reorder the table and reset the factor to that ordering
theTable %>%
  group_by(Position) %>%                              # calculate the counts
  summarize(counts = n()) %>%
  arrange(-counts) %>%                                # sort by counts
  mutate(Position = factor(Position, Position)) %>%   # reset factor
  ggplot(aes(x=Position, y=counts)) +                 # plot 
    geom_bar(stat="identity")                         # plot histogram
沙与沫 2024-10-27 09:30:02

除了 forcats::fct_infreq 之外,提到
@HolgerBrandl,有 forcats::fct_rev,它颠倒了因子顺序。

theTable <- data.frame(
    Position= 
        c("Zoalkeeper", "Zoalkeeper", "Defense",
          "Defense", "Defense", "Striker"),
    Name=c("James", "Frank","Jean",
           "Steve","John", "Tim"))

p1 <- ggplot(theTable, aes(x = Position)) + geom_bar()
p2 <- ggplot(theTable, aes(x = fct_infreq(Position))) + geom_bar()
p3 <- ggplot(theTable, aes(x = fct_rev(fct_infreq(Position)))) + geom_bar()

gridExtra::grid.arrange(p1, p2, p3, nrow=3)             

输入图片此处描述

In addition to forcats::fct_infreq, mentioned by
@HolgerBrandl, there is forcats::fct_rev, which reverses the factor order.

theTable <- data.frame(
    Position= 
        c("Zoalkeeper", "Zoalkeeper", "Defense",
          "Defense", "Defense", "Striker"),
    Name=c("James", "Frank","Jean",
           "Steve","John", "Tim"))

p1 <- ggplot(theTable, aes(x = Position)) + geom_bar()
p2 <- ggplot(theTable, aes(x = fct_infreq(Position))) + geom_bar()
p3 <- ggplot(theTable, aes(x = fct_rev(fct_infreq(Position)))) + geom_bar()

gridExtra::grid.arrange(p1, p2, p3, nrow=3)             

enter image description here

亽野灬性zι浪 2024-10-27 09:30:02

您只需将 Position 列指定为有序因子,其中级别按其计数排序:(

theTable <- transform( theTable,
       Position = ordered(Position, levels = names( sort(-table(Position)))))

请注意,table(Position) 生成 Position 列的频率计数。)

然后您的 ggplot 函数将以计数降序显示条形。
我不知道 geom_bar 中是否有一个选项可以执行此操作,而无需显式创建有序因子。

You just need to specify the Position column to be an ordered factor where the levels are ordered by their counts:

theTable <- transform( theTable,
       Position = ordered(Position, levels = names( sort(-table(Position)))))

(Note that the table(Position) produces a frequency-count of the Position column.)

Then your ggplot function will show the bars in decreasing order of count.
I don't know if there's an option in geom_bar to do this without having to explicitly create an ordered factor.

﹏半生如梦愿梦如真 2024-10-27 09:30:02

如果图表列来自数字变量,如下面的数据帧所示,您可以使用更简单的解决方案:

ggplot(df, aes(x = reorder(Colors, -Qty, sum), y = Qty)) 
+ geom_bar(stat = "identity")  

排序变量(-Qty)之前的减号控制排序方向(升序/降序)

这里有一些用于测试的数据:

df <- data.frame(Colors = c("Green","Yellow","Blue","Red","Yellow","Blue"),  
                 Qty = c(7,4,5,1,3,6)
                )

**Sample data:**
  Colors Qty
1  Green   7
2 Yellow   4
3   Blue   5
4    Red   1
5 Yellow   3
6   Blue   6

当我找到了这个帖子,这就是我正在寻找的答案。希望它对其他人有用。

If the chart columns come from a numeric variable as in the dataframe below, you can use a simpler solution:

ggplot(df, aes(x = reorder(Colors, -Qty, sum), y = Qty)) 
+ geom_bar(stat = "identity")  

The minus sign before the sort variable (-Qty) controls the sort direction (ascending/descending)

Here's some data for testing:

df <- data.frame(Colors = c("Green","Yellow","Blue","Red","Yellow","Blue"),  
                 Qty = c(7,4,5,1,3,6)
                )

**Sample data:**
  Colors Qty
1  Green   7
2 Yellow   4
3   Blue   5
4    Red   1
5 Yellow   3
6   Blue   6

When I found this thread, that was the answer I was looking for. Hope it's useful for others.

浅暮の光 2024-10-27 09:30:02

我同意 zach 的观点,即在 dplyr 内计数是最好的解决方案。我发现这是最短的版本:

dplyr::count(theTable, Position) %>%
          arrange(-n) %>%
          mutate(Position = factor(Position, Position)) %>%
          ggplot(aes(x=Position, y=n)) + geom_bar(stat="identity")

这也比预先重新排序因子水平要快得多,因为计数是在 dplyr 中完成的,而不是在 ggplot 中或使用 table 中完成的。

I agree with zach that counting within dplyr is the best solution. I've found this to be the shortest version:

dplyr::count(theTable, Position) %>%
          arrange(-n) %>%
          mutate(Position = factor(Position, Position)) %>%
          ggplot(aes(x=Position, y=n)) + geom_bar(stat="identity")

This will also be significantly faster than reordering the factor levels beforehand since the count is done in dplyr not in ggplot or using table.

舟遥客 2024-10-27 09:30:02

我发现 ggplot2 没有为此提供“自动”解决方案非常烦人。这就是为什么我在 ggcharts 中创建了 bar_chart() 函数

ggcharts::bar_chart(theTable, Position)

输入图像描述这里

默认情况下,bar_chart() 对条形图进行排序并显示水平图。要更改该设置,请设置horizo​​ntal = FALSE。此外,bar_chart() 消除了条形图和轴之间难看的“间隙”。

I found it very annoying that ggplot2 doesn't offer an 'automatic' solution for this. That's why I created the bar_chart() function in ggcharts.

ggcharts::bar_chart(theTable, Position)

enter image description here

By default bar_chart() sorts the bars and displays a horizontal plot. To change that set horizontal = FALSE. In addition, bar_chart() removes the unsightly 'gap' between the bars and the axis.

蓬勃野心 2024-10-27 09:30:02

由于我们只查看单个变量(“位置”)的分布,而不是查看两个变量之间的关系,那么可能直方图将是更合适的图表。 ggplot 有 geom_histogram() ,这使得它变得简单:

ggplot(theTable, aes(x = Position)) + geom_histogram(stat="count")

在此处输入图像描述

使用geom_histogram():

我认为geom_histogram()是有点奇怪,因为它以不同的方式处理连续数据和离散数据。

对于连续数据,您可以使用 geom_histogram()没有参数。
例如,如果我们添加数字向量“Score”...

    Name   Position   Score  
1   James  Goalkeeper 10
2   Frank  Goalkeeper 20
3   Jean   Defense    10
4   Steve  Defense    10
5   John   Defense    20
6   Tim    Striker    50

并对“Score”变量使用 geom_histogram()...

ggplot(theTable, aes(x = Score)) + geom_histogram()

在此处输入图像描述

对于离散数据,例如“ Position”我们必须指定一个根据美学计算的统计量,以使用 stat = "count" 给出条形高度的 y 值:

 ggplot(theTable, aes(x = Position)) + geom_histogram(stat = "count")

注意: 奇怪且令人困惑您还可以使用 stat = "count" 来获取连续数据,我认为它提供了一个更美观的图表。

ggplot(theTable, aes(x = Score)) + geom_histogram(stat = "count")

输入图片此处描述

编辑:扩展答案以响应 DebanjanB 的帮助建议。

Since we are only looking at the distribution of a single variable ("Position") as opposed to looking at the relationship between two variables, then perhaps a histogram would be the more appropriate graph. ggplot has geom_histogram() that makes it easy:

ggplot(theTable, aes(x = Position)) + geom_histogram(stat="count")

enter image description here

Using geom_histogram():

I think geom_histogram() is a little quirky as it treats continuous and discrete data differently.

For continuous data, you can just use geom_histogram() with no parameters.
For example, if we add in a numeric vector "Score"...

    Name   Position   Score  
1   James  Goalkeeper 10
2   Frank  Goalkeeper 20
3   Jean   Defense    10
4   Steve  Defense    10
5   John   Defense    20
6   Tim    Striker    50

and use geom_histogram() on the "Score" variable...

ggplot(theTable, aes(x = Score)) + geom_histogram()

enter image description here

For discrete data like "Position" we have to specify a calculated statistic computed by the aesthetic to give the y value for the height of the bars using stat = "count":

 ggplot(theTable, aes(x = Position)) + geom_histogram(stat = "count")

Note: Curiously and confusingly you can also use stat = "count" for continuous data as well and I think it provides a more aesthetically pleasing graph.

ggplot(theTable, aes(x = Score)) + geom_histogram(stat = "count")

enter image description here

Edits: Extended answer in response to DebanjanB's helpful suggestions.

悸初 2024-10-27 09:30:02
library(ggplot2)
library(magrittr)

dd <- tibble::tribble(
    ~Name,    ~Position,
  "James", "Goalkeeper",
  "Frank", "Goalkeeper",
   "Jean",    "Defense",
   "John",    "Defense",
  "Steve",    "Defense",
    "Tim",    "Striker"
  )


dd %>% ggplot(aes(x = forcats::fct_infreq(Position))) + geom_bar()

创建于 2022 年 8 月 30 日,使用 reprex v2.0.2

library(ggplot2)
library(magrittr)

dd <- tibble::tribble(
    ~Name,    ~Position,
  "James", "Goalkeeper",
  "Frank", "Goalkeeper",
   "Jean",    "Defense",
   "John",    "Defense",
  "Steve",    "Defense",
    "Tim",    "Striker"
  )


dd %>% ggplot(aes(x = forcats::fct_infreq(Position))) + geom_bar()

Created on 2022-08-30 with reprex v2.0.2

流年已逝 2024-10-27 09:30:02

如果你不想使用ggplot2,还有ggpubr为 ggbarplot 函数提供了一个非常有用的参数。您可以按“desc”和“asc”中的 sort.val 对条形进行排序,如下所示:

library(dplyr)
library(ggpubr)
# desc
df %>%
  count(Position) %>%
  ggbarplot(x = "Position", 
            y = "n",
            sort.val = "desc")

# asc
df %>%
  count(Position) %>%
  ggbarplot(x = "Position", 
            y = "n",
            sort.val = "asc")

reprex 包 (v2.0.1)

如您所见,对条形图进行排序非常简单。如果条形已分组,也可以完成此操作。检查上面的链接以获取一些有用的示例。

If you don't want to use ggplot2, there is also ggpubr with a really helpful argument for the ggbarplot function. You can sort the bars by sort.val in "desc" and "asc" like this:

library(dplyr)
library(ggpubr)
# desc
df %>%
  count(Position) %>%
  ggbarplot(x = "Position", 
            y = "n",
            sort.val = "desc")

# asc
df %>%
  count(Position) %>%
  ggbarplot(x = "Position", 
            y = "n",
            sort.val = "asc")

Created on 2022-08-14 by the reprex package (v2.0.1)

As you can see, it is really simple to sort the bars. This can also be done if the bars are grouped. Check the link above for some helpful examples.

妞丶爷亲个 2024-10-27 09:30:02

您可以简单地使用此代码:

ggplot(yourdatasetname, aes(Position, fill = Name)) + 
     geom_bar(col = "black", size = 2)

在此处输入图像描述

you can simply use this code:

ggplot(yourdatasetname, aes(Position, fill = Name)) + 
     geom_bar(col = "black", size = 2)

enter image description here

秉烛思 2024-10-27 09:30:01

@GavinSimpson:reorder 是一个强大且有效的解决方案:

ggplot(theTable,
       aes(x=reorder(Position,Position,
                     function(x)-length(x)))) +
       geom_bar()

@GavinSimpson: reorder is a powerful and effective solution for this:

ggplot(theTable,
       aes(x=reorder(Position,Position,
                     function(x)-length(x)))) +
       geom_bar()
烟凡古楼 2024-10-27 09:30:01

排序的关键是按照您想要的顺序设置因子的水平。不需要有序因子;有序因子中的额外信息是不必要的,如果这些数据在任何统计模型中使用,则可能会导致错误的参数化 - 多项式对比不适用于此类名义数据。

## set the levels in order we want
theTable <- within(theTable, 
                   Position <- factor(Position, 
                                      levels=names(sort(table(Position), 
                                                        decreasing=TRUE))))
## plot
ggplot(theTable,aes(x=Position))+geom_bar(binwidth=1)

barplotfigure

在最一般的意义上,我们只需将因子水平设置为所需的顺序即可。如果未指定,因子的级别将按字母顺序排序。您还可以如上所述在对因子的调用中指定级别顺序,并且其他方式也是可能的。

theTable$Position <- factor(theTable$Position, levels = c(...))

The key with ordering is to set the levels of the factor in the order you want. An ordered factor is not required; the extra information in an ordered factor isn't necessary and if these data are being used in any statistical model, the wrong parametrisation might result — polynomial contrasts aren't right for nominal data such as this.

## set the levels in order we want
theTable <- within(theTable, 
                   Position <- factor(Position, 
                                      levels=names(sort(table(Position), 
                                                        decreasing=TRUE))))
## plot
ggplot(theTable,aes(x=Position))+geom_bar(binwidth=1)

barplot figure

In the most general sense, we simply need to set the factor levels to be in the desired order. If left unspecified, the levels of a factor will be sorted alphabetically. You can also specify the level order within the call to factor as above, and other ways are possible as well.

theTable$Position <- factor(theTable$Position, levels = c(...))
二智少女猫性小仙女 2024-10-27 09:30:01

使用 scale_x_discrete (limits = ...) 指定条形的顺序。

positions <- c("Goalkeeper", "Defense", "Striker")
p <- ggplot(theTable, aes(x = Position)) + scale_x_discrete(limits = positions)

Using scale_x_discrete (limits = ...) to specify the order of bars.

positions <- c("Goalkeeper", "Defense", "Striker")
p <- ggplot(theTable, aes(x = Position)) + scale_x_discrete(limits = positions)
心在旅行 2024-10-27 09:30:01

我认为已经提供的解决方案过于冗长。使用 ggplot 进行频率排序条形图的更简洁方法是

ggplot(theTable, aes(x=reorder(Position, -table(Position)[Position]))) + geom_bar()

它与 Alex Brown 建议的类似,但更短一些并且无需任何函数定义即可工作。

更新

我认为我的旧解决方案当时很好,但现在我宁愿使用forcats::fct_infreq,它按频率对因子级别进行排序:

require(forcats)

ggplot(theTable, aes(fct_infreq(Position))) + geom_bar()

I think the already provided solutions are overly verbose. A more concise way to do a frequency sorted barplot with ggplot is

ggplot(theTable, aes(x=reorder(Position, -table(Position)[Position]))) + geom_bar()

It's similar to what Alex Brown suggested, but a bit shorter and works without an anynymous function definition.

Update

I think my old solution was good at the time, but nowadays I'd rather use forcats::fct_infreq which is sorting factor levels by frequency:

require(forcats)

ggplot(theTable, aes(fct_infreq(Position))) + geom_bar()
莳間冲淡了誓言ζ 2024-10-27 09:30:01

就像 Alex Brown 的答案中的 reorder() 一样,我们也可以使用 forcats::fct_reorder()。应用指定函数后,它基本上会根据第二个参数中的值对第一个参数中指定的因子进行排序(默认值 = 中位数,这就是我们在这里使用的每个因子级别只有一个值)。

遗憾的是,在OP的问题中,所需的顺序也是按字母顺序排列的,因为这是创建因子时的默认排序顺序,因此将隐藏此函数实际执行的操作。为了更清楚地说明,我将用“Zoalkeeper”替换“Goalkeeper”。

library(tidyverse)
library(forcats)

theTable <- data.frame(
                Name = c('James', 'Frank', 'Jean', 'Steve', 'John', 'Tim'),
                Position = c('Zoalkeeper', 'Zoalkeeper', 'Defense',
                             'Defense', 'Defense', 'Striker'))

theTable %>%
    count(Position) %>%
    mutate(Position = fct_reorder(Position, n, .desc = TRUE)) %>%
    ggplot(aes(x = Position, y = n)) + geom_bar(stat = 'identity')

输入图像描述这里

Like reorder() in Alex Brown's answer, we could also use forcats::fct_reorder(). It will basically sort the factors specified in the 1st arg, according to the values in the 2nd arg after applying a specified function (default = median, which is what we use here as just have one value per factor level).

It is a shame that in the OP's question, the order required is also alphabetical as that is the default sort order when you create factors, so will hide what this function is actually doing. To make it more clear, I'll replace "Goalkeeper" with "Zoalkeeper".

library(tidyverse)
library(forcats)

theTable <- data.frame(
                Name = c('James', 'Frank', 'Jean', 'Steve', 'John', 'Tim'),
                Position = c('Zoalkeeper', 'Zoalkeeper', 'Defense',
                             'Defense', 'Defense', 'Striker'))

theTable %>%
    count(Position) %>%
    mutate(Position = fct_reorder(Position, n, .desc = TRUE)) %>%
    ggplot(aes(x = Position, y = n)) + geom_bar(stat = 'identity')

enter image description here

我偏爱纯白色 2024-10-27 09:30:01

另一种替代方法是使用重新排序来对因子的级别进行排序。根据计数按升序 (n) 或降序 (-n) 排列。与使用 forcats 包中的 fct_reorder 非常相似:

降序

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, -n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

在此处输入图像描述

升序

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

enter图片描述在这里

数据框:

df <- structure(list(Position = structure(c(3L, 3L, 1L, 1L, 1L, 2L), .Label = c("Defense", 
"Striker", "Zoalkeeper"), class = "factor"), Name = structure(c(2L, 
1L, 3L, 5L, 4L, 6L), .Label = c("Frank", "James", "Jean", "John", 
"Steve", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

Another alternative using reorder to order the levels of a factor. In ascending (n) or descending order (-n) based on the count. Very similar to the one using fct_reorder from the forcats package:

Descending order

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, -n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

enter image description here

Ascending order

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

enter image description here

Data frame:

df <- structure(list(Position = structure(c(3L, 3L, 1L, 1L, 1L, 2L), .Label = c("Defense", 
"Striker", "Zoalkeeper"), class = "factor"), Name = structure(c(2L, 
1L, 3L, 5L, 4L, 6L), .Label = c("Frank", "James", "Jean", "John", 
"Steve", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文