使用 ggplot2 和 R 创建帕累托图

发布于 2024-08-11 08:05:00 字数 1475 浏览 4 评论 0原文

我一直在努力解决如何使用 ggplot2 包在 R 中制作帕累托图。在许多情况下,当制作条形图或直方图时,我们希望项目按 X 轴排序。在帕累托图中,我们希望项目按 Y 轴中的值降序排列。有没有办法让 ggplot 绘制按 Y 轴值排序的项目?我尝试先对数据框进行排序,但 ggplot 似乎对它们进行了重新排序。

示例:

val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")

数据框 val 已排序,但输出如下所示:

替代文字
(来源:cerebralmastication.com

Hadley 正确地指出,这会产生更好的图形来显示实际值与预测值:

ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))

返回:

替代文本
(来源:cerebralmastication.com

但这仍然不是帕累托图。有什么建议吗?

I have been struggling with how to make a Pareto Chart in R using the ggplot2 package. In many cases when making a bar chart or histogram we want items sorted by the X axis. In a Pareto Chart we want the items ordered descending by the value in the Y axis. Is there a way to get ggplot to plot items ordered by the value in the Y axis? I tried sorting the data frame first but it seems ggplot reorders them.

Example:

val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")

the data frame val is sorted but the output looks like this:

alt text
(source: cerebralmastication.com)

Hadley correctly pointed out that this produces a much better graphic for showing actuals vs. predicted:

ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))

which returns:

alt text
(source: cerebralmastication.com)

But it's still not a Pareto Chart. Any tips?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

柠檬色的秋千 2024-08-18 08:05:00

对数据进行子集化和排序;

valact <- subset(val, variable=='actual')
valsort <- valact[ order(-valact[,"Value"]),]

从那里开始,它只是一个标准的 boxplot() ,上面有一个非常手动的累积函数:

op <- par(mar=c(3,3,3,3)) 
bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),    
              names.arg=as.character(valsort[,"State"]), main="How's that?") 
lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), 
      ylim=c(0,1.05), col='red') 
axis(4)
box() 
par(op)

它应该看起来像这样

替代文本
(来源:eddelbuettel.com

,它没有甚至不需要过度绘制技巧,因为 lines() 愉快地注释了初始图。

Subsetting and sorting your data;

valact <- subset(val, variable=='actual')
valsort <- valact[ order(-valact[,"Value"]),]

From there it's just a standard boxplot() with a very manual cumulative function on top:

op <- par(mar=c(3,3,3,3)) 
bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),    
              names.arg=as.character(valsort[,"State"]), main="How's that?") 
lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), 
      ylim=c(0,1.05), col='red') 
axis(4)
box() 
par(op)

which should look like this

alt text
(source: eddelbuettel.com)

and it doesn't even need the overplotting trick as lines() happily annotates the initial plot.

汹涌人海 2024-08-18 08:05:00

ggplot2 中的条形图按照因子中水平的顺序进行排序。

val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State))

The bars in ggplot2 are ordered by the ordering of the levels in the factor.

val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State))
一袭水袖舞倾城 2024-08-18 08:05:00

ggplot2中的一个传统帕累托图......

阅读后开发
卡诺 (EL)、莫古尔扎 (JM) 和雷德查克,A.(2012)。六西格码与 R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer。

library(ggplot2);library(grid)

counts  <- c(80, 27, 66, 94, 33)
defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.")
dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE )
dat <- dat[order(dat$count, decreasing=TRUE),]
dat$defect <- factor(dat$defect, levels=dat$defect)
dat$cum <- cumsum(dat$count)
count.sum<-sum(dat$count)
dat$cum_perc<-100*dat$cum/count.sum

p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1))
p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path()

p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank())
p1<-p1+theme(legend.position="none")

p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect))
p2<- p2 + geom_bar()

p2<-p2+theme(legend.position="none")

plot.new()
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1))

A traditional Pareto chart in ggplot2.......

Developed after reading
Cano, E. L., Moguerza, J. M., & Redchuk, A. (2012). Six Sigma with R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer.

library(ggplot2);library(grid)

counts  <- c(80, 27, 66, 94, 33)
defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.")
dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE )
dat <- dat[order(dat$count, decreasing=TRUE),]
dat$defect <- factor(dat$defect, levels=dat$defect)
dat$cum <- cumsum(dat$count)
count.sum<-sum(dat$count)
dat$cum_perc<-100*dat$cum/count.sum

p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1))
p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path()

p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank())
p1<-p1+theme(legend.position="none")

p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect))
p2<- p2 + geom_bar()

p2<-p2+theme(legend.position="none")

plot.new()
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1))
久夏青 2024-08-18 08:05:00

我们可以使用 ggQC 包。

library(ggplot2)
library(ggQC)
Data4Pareto <- data.frame(
  KPI = c("Customer Service Time", "Order Fulfillment", "Order Processing Time",
          "Order Production Time", "Order Quality Control Time", "Rework Time",
          "Shipping"),
  Time = c(1.50, 38.50, 3.75, 23.08, 1.92, 3.58, 73.17)) 


ggplot2::ggplot(Data4Pareto, aes(x = KPI, y = Time)) +
 ggQC::stat_pareto(point.color = "red",
                   point.size = 3,
                   line.color = "black",
                   bars.fill = c("blue", "orange")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5))

输入图像描述这里

来源

We can use the ggQC package.

library(ggplot2)
library(ggQC)
Data4Pareto <- data.frame(
  KPI = c("Customer Service Time", "Order Fulfillment", "Order Processing Time",
          "Order Production Time", "Order Quality Control Time", "Rework Time",
          "Shipping"),
  Time = c(1.50, 38.50, 3.75, 23.08, 1.92, 3.58, 73.17)) 


ggplot2::ggplot(Data4Pareto, aes(x = KPI, y = Time)) +
 ggQC::stat_pareto(point.color = "red",
                   point.size = 3,
                   line.color = "black",
                   bars.fill = c("blue", "orange")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5))

enter image description here

Source

念﹏祤嫣 2024-08-18 08:05:00

举一个简单的例子:

 > data
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10 
0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925 

barplot(data) 正确地执行了

ggplot 等效项“应该是”的操作: qplot(x=names(data), y=data, geom='bar')< /code>

但这会错误地按字母顺序对条形进行重新排序/排序...因为这就是 levels(factor(names(data))) 的排序方式。

解决方案:qplot(x=factor(names(data),levels=names(data)),y=data,geom='bar')

唷!

With a simple example:

 > data
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10 
0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925 

barplot(data) does things correctly

the ggplot equivalent "should be": qplot(x=names(data), y=data, geom='bar')

But that incorrectly reorders/sorts the bars alphabetically... because that's how levels(factor(names(data))) would be ordered.

Solution: qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')

Phew!

爱的那么颓废 2024-08-18 08:05:00

另请参阅包 qcc ,它具有函数 pareto.chart()。看起来它也使用基础图形,所以开始为 ggplot2 解决方案提供赏金:-)

Also, see the package qcc which has a function pareto.chart(). Looks like it uses base graphics too, so start your bounty for a ggplot2-solution :-)

半﹌身腐败 2024-08-18 08:05:00

为了简化事情,我们只考虑估计。

estimates <- subset(val, variable == "estimate")

首先,我们对因子水平进行重新排序,以便按Value 的降序绘制State

estimates$State <- with(estimates, reorder(State, -Value))

同样,我们对数据集重新排序并计算累积值。

estimates <- estimates[order(estimates$Value, decreasing = TRUE),]
estimates$cumulative <- cumsum(estimates$Value)

现在我们准备绘制情节了。在同一轴上获得线条和条形的技巧是将状态变量(因子)转换为数字。

p <- ggplot(estimates, aes(State, Value)) + 
  geom_bar() +
  geom_line(aes(as.numeric(State), cumulative))
p

正如问题中提到的,尝试绘制彼此相邻的两个变量组的两个帕累托图并不是很容易。如果您想要多个帕累托图,您可能最好使用分面。

To simplify things, let's just consider only the estimates.

estimates <- subset(val, variable == "estimate")

First we reorder the factor levels, so that States are plotted in decreasing order of Value.

estimates$State <- with(estimates, reorder(State, -Value))

Similarly, we reorder the dataset and calculate a cumulative value.

estimates <- estimates[order(estimates$Value, decreasing = TRUE),]
estimates$cumulative <- cumsum(estimates$Value)

Now we are ready to draw the plot. The trick to get a line and bar on the same axes is to convert the State variable (a factor) to be numeric.

p <- ggplot(estimates, aes(State, Value)) + 
  geom_bar() +
  geom_line(aes(as.numeric(State), cumulative))
p

As mentioned in the question, trying to draw two Pareto plots of two variable groups right next to each other isn't very easy. You'd probably be better off using facetting if you want multiple Pareto plots.

小鸟爱天空丶 2024-08-18 08:05:00
freqplot = function(x, by = NULL, right = FALSE)
{
if(is.null(by)) stop('Valor de "by" precisa ser especificado.')
breaks = seq(min(x), max(x), by = by )
ecd = ecdf(x)
den = ecd(breaks)
table = table(cut(x, breaks = breaks, right = right))
table = table/sum(table)

intervs = factor(names(table), levels = names(table))
freq = as.numeric(table/sum(table))
acum = as.numeric(cumsum(table))

normalize.vec = function(x){
  (x - min(x))/(max(x) - min(x))
}

dados = data.frame(classe = intervs, freq = freq, acum = acum, acum_norm = normalize.vec(acum))
p = ggplot(dados) + 
  geom_bar(aes(classe, freq, fill = classe), stat = 'identity') +
  geom_point(aes(classe, acum_norm, group = '1'), shape = I(1), size = I(3), colour = 'gray20') +
  geom_line(aes(classe, acum_norm, group = '1'), colour = I('gray20'))

p
}
freqplot = function(x, by = NULL, right = FALSE)
{
if(is.null(by)) stop('Valor de "by" precisa ser especificado.')
breaks = seq(min(x), max(x), by = by )
ecd = ecdf(x)
den = ecd(breaks)
table = table(cut(x, breaks = breaks, right = right))
table = table/sum(table)

intervs = factor(names(table), levels = names(table))
freq = as.numeric(table/sum(table))
acum = as.numeric(cumsum(table))

normalize.vec = function(x){
  (x - min(x))/(max(x) - min(x))
}

dados = data.frame(classe = intervs, freq = freq, acum = acum, acum_norm = normalize.vec(acum))
p = ggplot(dados) + 
  geom_bar(aes(classe, freq, fill = classe), stat = 'identity') +
  geom_point(aes(classe, acum_norm, group = '1'), shape = I(1), size = I(3), colour = 'gray20') +
  geom_line(aes(classe, acum_norm, group = '1'), colour = I('gray20'))

p
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文