ggplot2 条形图中的订单条形图
我正在尝试制作一个条形图,其中最大的条形最接近 y 轴,最短的条形最远。所以这有点像我的表格
Name Position
1 James Goalkeeper
2 Frank Goalkeeper
3 Jean Defense
4 Steve Defense
5 John Defense
6 Tim Striker
所以我试图建立一个条形图来显示根据位置的球员数量
p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)
,但该图首先显示守门员条形图,然后是防守条形图,最后是前锋条形图。我希望对图表进行排序,以便防守条最接近 y 轴,守门员条,最后是前锋条。 谢谢
I am trying to make a bar graph where the largest bar would be nearest to the y axis and the shortest bar would be furthest. So this is kind of like the Table I have
Name Position
1 James Goalkeeper
2 Frank Goalkeeper
3 Jean Defense
4 Steve Defense
5 John Defense
6 Tim Striker
So I am trying to build a bar graph that would show the number of players according to position
p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)
but the graph shows the goalkeeper bar first then the defense, and finally the striker one. I would want the graph to be ordered so that the defense bar is closest to the y axis, the goalkeeper one, and finally the striker one.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
一个简单的基于 dplyr 的因子重新排序可以解决这个问题:
A simple dplyr based reordering of factors can solve this problem:
除了
forcats::fct_infreq
之外,提到@HolgerBrandl,有
forcats::fct_rev
,它颠倒了因子顺序。In addition to
forcats::fct_infreq
, mentioned by@HolgerBrandl, there is
forcats::fct_rev
, which reverses the factor order.您只需将
Position
列指定为有序因子,其中级别按其计数排序:(请注意,
table(Position)
生成Position
列的频率计数。)然后您的
ggplot
函数将以计数降序显示条形。我不知道
geom_bar
中是否有一个选项可以执行此操作,而无需显式创建有序因子。You just need to specify the
Position
column to be an ordered factor where the levels are ordered by their counts:(Note that the
table(Position)
produces a frequency-count of thePosition
column.)Then your
ggplot
function will show the bars in decreasing order of count.I don't know if there's an option in
geom_bar
to do this without having to explicitly create an ordered factor.如果图表列来自数字变量,如下面的数据帧所示,您可以使用更简单的解决方案:
排序变量(-Qty)之前的减号控制排序方向(升序/降序)
这里有一些用于测试的数据:
当我找到了这个帖子,这就是我正在寻找的答案。希望它对其他人有用。
If the chart columns come from a numeric variable as in the dataframe below, you can use a simpler solution:
The minus sign before the sort variable (-Qty) controls the sort direction (ascending/descending)
Here's some data for testing:
When I found this thread, that was the answer I was looking for. Hope it's useful for others.
我同意 zach 的观点,即在 dplyr 内计数是最好的解决方案。我发现这是最短的版本:
这也比预先重新排序因子水平要快得多,因为计数是在 dplyr 中完成的,而不是在 ggplot 中或使用
table
中完成的。I agree with zach that counting within dplyr is the best solution. I've found this to be the shortest version:
This will also be significantly faster than reordering the factor levels beforehand since the count is done in dplyr not in ggplot or using
table
.我发现 ggplot2 没有为此提供“自动”解决方案非常烦人。这就是为什么我在
ggcharts 中创建了
。bar_chart()
函数默认情况下,
bar_chart()
对条形图进行排序并显示水平图。要更改该设置,请设置horizontal = FALSE
。此外,bar_chart()
消除了条形图和轴之间难看的“间隙”。I found it very annoying that
ggplot2
doesn't offer an 'automatic' solution for this. That's why I created thebar_chart()
function inggcharts
.By default
bar_chart()
sorts the bars and displays a horizontal plot. To change that sethorizontal = FALSE
. In addition,bar_chart()
removes the unsightly 'gap' between the bars and the axis.由于我们只查看单个变量(“位置”)的分布,而不是查看两个变量之间的关系,那么可能直方图将是更合适的图表。 ggplot 有 geom_histogram() ,这使得它变得简单:
使用geom_histogram():
我认为geom_histogram()是有点奇怪,因为它以不同的方式处理连续数据和离散数据。
对于连续数据,您可以使用 geom_histogram()没有参数。
例如,如果我们添加数字向量“Score”...
并对“Score”变量使用 geom_histogram()...
对于离散数据,例如“ Position”我们必须指定一个根据美学计算的统计量,以使用
stat = "count"
给出条形高度的 y 值:注意: 奇怪且令人困惑您还可以使用 stat = "count" 来获取连续数据,我认为它提供了一个更美观的图表。
编辑:扩展答案以响应 DebanjanB 的帮助建议。
Since we are only looking at the distribution of a single variable ("Position") as opposed to looking at the relationship between two variables, then perhaps a histogram would be the more appropriate graph. ggplot has geom_histogram() that makes it easy:
Using geom_histogram():
I think geom_histogram() is a little quirky as it treats continuous and discrete data differently.
For continuous data, you can just use geom_histogram() with no parameters.
For example, if we add in a numeric vector "Score"...
and use geom_histogram() on the "Score" variable...
For discrete data like "Position" we have to specify a calculated statistic computed by the aesthetic to give the y value for the height of the bars using
stat = "count"
:Note: Curiously and confusingly you can also use
stat = "count"
for continuous data as well and I think it provides a more aesthetically pleasing graph.Edits: Extended answer in response to DebanjanB's helpful suggestions.
创建于 2022 年 8 月 30 日,使用 reprex v2.0.2
Created on 2022-08-30 with reprex v2.0.2
如果你不想使用
ggplot2
,还有ggpubr为 ggbarplot 函数提供了一个非常有用的参数。您可以按“desc”和“asc”中的sort.val
对条形进行排序,如下所示:由 reprex 包 (v2.0.1)
如您所见,对条形图进行排序非常简单。如果条形已分组,也可以完成此操作。检查上面的链接以获取一些有用的示例。
If you don't want to use
ggplot2
, there is also ggpubr with a really helpful argument for theggbarplot
function. You can sort the bars bysort.val
in "desc" and "asc" like this:Created on 2022-08-14 by the reprex package (v2.0.1)
As you can see, it is really simple to sort the bars. This can also be done if the bars are grouped. Check the link above for some helpful examples.
您可以简单地使用此代码:
you can simply use this code:
@GavinSimpson:
reorder
是一个强大且有效的解决方案:@GavinSimpson:
reorder
is a powerful and effective solution for this:排序的关键是按照您想要的顺序设置因子的水平。不需要有序因子;有序因子中的额外信息是不必要的,如果这些数据在任何统计模型中使用,则可能会导致错误的参数化 - 多项式对比不适用于此类名义数据。
在最一般的意义上,我们只需将因子水平设置为所需的顺序即可。如果未指定,因子的级别将按字母顺序排序。您还可以如上所述在对因子的调用中指定级别顺序,并且其他方式也是可能的。
The key with ordering is to set the levels of the factor in the order you want. An ordered factor is not required; the extra information in an ordered factor isn't necessary and if these data are being used in any statistical model, the wrong parametrisation might result — polynomial contrasts aren't right for nominal data such as this.
In the most general sense, we simply need to set the factor levels to be in the desired order. If left unspecified, the levels of a factor will be sorted alphabetically. You can also specify the level order within the call to factor as above, and other ways are possible as well.
使用
scale_x_discrete (limits = ...)
指定条形的顺序。Using
scale_x_discrete (limits = ...)
to specify the order of bars.我认为已经提供的解决方案过于冗长。使用 ggplot 进行频率排序条形图的更简洁方法是
它与 Alex Brown 建议的类似,但更短一些并且无需任何函数定义即可工作。
更新
我认为我的旧解决方案当时很好,但现在我宁愿使用
forcats::fct_infreq
,它按频率对因子级别进行排序:I think the already provided solutions are overly verbose. A more concise way to do a frequency sorted barplot with ggplot is
It's similar to what Alex Brown suggested, but a bit shorter and works without an anynymous function definition.
Update
I think my old solution was good at the time, but nowadays I'd rather use
forcats::fct_infreq
which is sorting factor levels by frequency:就像 Alex Brown 的答案中的
reorder()
一样,我们也可以使用forcats::fct_reorder()
。应用指定函数后,它基本上会根据第二个参数中的值对第一个参数中指定的因子进行排序(默认值 = 中位数,这就是我们在这里使用的每个因子级别只有一个值)。遗憾的是,在OP的问题中,所需的顺序也是按字母顺序排列的,因为这是创建因子时的默认排序顺序,因此将隐藏此函数实际执行的操作。为了更清楚地说明,我将用“Zoalkeeper”替换“Goalkeeper”。
Like
reorder()
in Alex Brown's answer, we could also useforcats::fct_reorder()
. It will basically sort the factors specified in the 1st arg, according to the values in the 2nd arg after applying a specified function (default = median, which is what we use here as just have one value per factor level).It is a shame that in the OP's question, the order required is also alphabetical as that is the default sort order when you create factors, so will hide what this function is actually doing. To make it more clear, I'll replace "Goalkeeper" with "Zoalkeeper".
另一种替代方法是使用重新排序来对因子的级别进行排序。根据计数按升序 (n) 或降序 (-n) 排列。与使用
forcats
包中的fct_reorder
非常相似:降序
升序
数据框:
Another alternative using reorder to order the levels of a factor. In ascending (n) or descending order (-n) based on the count. Very similar to the one using
fct_reorder
from theforcats
package:Descending order
Ascending order
Data frame: