使用 ggplot2 和 R 创建帕累托图
我一直在努力解决如何使用 ggplot2 包在 R 中制作帕累托图。在许多情况下,当制作条形图或直方图时,我们希望项目按 X 轴排序。在帕累托图中,我们希望项目按 Y 轴中的值降序排列。有没有办法让 ggplot 绘制按 Y 轴值排序的项目?我尝试先对数据框进行排序,但 ggplot 似乎对它们进行了重新排序。
示例:
val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")
数据框 val 已排序,但输出如下所示:
Hadley 正确地指出,这会产生更好的图形来显示实际值与预测值:
ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))
返回:
但这仍然不是帕累托图。有什么建议吗?
I have been struggling with how to make a Pareto Chart in R using the ggplot2 package. In many cases when making a bar chart or histogram we want items sorted by the X axis. In a Pareto Chart we want the items ordered descending by the value in the Y axis. Is there a way to get ggplot to plot items ordered by the value in the Y axis? I tried sorting the data frame first but it seems ggplot reorders them.
Example:
val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")
the data frame val is sorted but the output looks like this:
(source: cerebralmastication.com)
Hadley correctly pointed out that this produces a much better graphic for showing actuals vs. predicted:
ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))
which returns:
(source: cerebralmastication.com)
But it's still not a Pareto Chart. Any tips?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
对数据进行子集化和排序;
从那里开始,它只是一个标准的
boxplot()
,上面有一个非常手动的累积函数:它应该看起来像这样
(来源:eddelbuettel.com)
,它没有甚至不需要过度绘制技巧,因为
lines()
愉快地注释了初始图。Subsetting and sorting your data;
From there it's just a standard
boxplot()
with a very manual cumulative function on top:which should look like this
(source: eddelbuettel.com)
and it doesn't even need the overplotting trick as
lines()
happily annotates the initial plot.ggplot2 中的条形图按照因子中水平的顺序进行排序。
The bars in ggplot2 are ordered by the ordering of the levels in the factor.
ggplot2中的一个传统帕累托图......
阅读后开发
卡诺 (EL)、莫古尔扎 (JM) 和雷德查克,A.(2012)。六西格码与 R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer。
A traditional Pareto chart in ggplot2.......
Developed after reading
Cano, E. L., Moguerza, J. M., & Redchuk, A. (2012). Six Sigma with R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer.
我们可以使用 ggQC 包。
来源
We can use the
ggQC
package.Source
举一个简单的例子:
barplot(data)
正确地执行了ggplot 等效项“应该是”的操作:
qplot(x=names(data), y=data, geom='bar')< /code>
但这会错误地按字母顺序对条形进行重新排序/排序...因为这就是
levels(factor(names(data)))
的排序方式。解决方案:
qplot(x=factor(names(data),levels=names(data)),y=data,geom='bar')
唷!
With a simple example:
barplot(data)
does things correctlythe ggplot equivalent "should be":
qplot(x=names(data), y=data, geom='bar')
But that incorrectly reorders/sorts the bars alphabetically... because that's how
levels(factor(names(data)))
would be ordered.Solution:
qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')
Phew!
另请参阅包 qcc ,它具有函数
pareto.chart()
。看起来它也使用基础图形,所以开始为 ggplot2 解决方案提供赏金:-)Also, see the package qcc which has a function
pareto.chart()
. Looks like it uses base graphics too, so start your bounty for a ggplot2-solution :-)为了简化事情,我们只考虑估计。
首先,我们对因子水平进行重新排序,以便按
Value
的降序绘制State
。同样,我们对数据集重新排序并计算累积值。
现在我们准备绘制情节了。在同一轴上获得线条和条形的技巧是将状态变量(因子)转换为数字。
正如问题中提到的,尝试绘制彼此相邻的两个变量组的两个帕累托图并不是很容易。如果您想要多个帕累托图,您可能最好使用分面。
To simplify things, let's just consider only the estimates.
First we reorder the factor levels, so that
State
s are plotted in decreasing order ofValue
.Similarly, we reorder the dataset and calculate a cumulative value.
Now we are ready to draw the plot. The trick to get a line and bar on the same axes is to convert the State variable (a factor) to be numeric.
As mentioned in the question, trying to draw two Pareto plots of two variable groups right next to each other isn't very easy. You'd probably be better off using facetting if you want multiple Pareto plots.