Boxplot schmoxplot:如何绘制由 R 中的一个因素决定的均值和标准误差?

发布于 2024-08-05 10:59:40 字数 620 浏览 1 评论 0原文

我们都喜欢中位数和四分位距等稳健的衡量标准,但让我们面对现实吧,在许多领域,箱线图几乎从未出现在已发表的文章中,而均值和标准误差却一直如此。

在lattice、ggplot2等中绘制箱线图很简单,画廊里到处都是。是否有一种同样简单的方法来绘制以分类变量为条件的平均值和标准误差?

我正在考虑这样的情节:

http://freakonomics.blogs.nytimes.com/2008/07/30/how-big-is-your-halo-a-guest-post/

或者所谓的“意味着钻石” " 在 JMP 中(参见图 3):

http://blogs.sas.com/jmp/index.php?/archives/127-What-Good-Are-Error-Bars.html

We all love robust measures like medians and interquartile ranges, but lets face it, in many fields, boxplots almost never show up in published articles, while means and standard errors do so all the time.

It's simple in lattice, ggplot2, etc to draw boxplots and the galleries are full of them. Is there an equally straightforward way to draw means and standard errors, conditioned by a categorical variable?

I'm taking about plots like these:

http://freakonomics.blogs.nytimes.com/2008/07/30/how-big-is-your-halo-a-guest-post/

Or what are called "means diamonds" in JMP (see Figure 3):

http://blogs.sas.com/jmp/index.php?/archives/127-What-Good-Are-Error-Bars.html

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

手心的海 2024-08-12 10:59:40

第一个图刚刚在 imachordata.com 上的博客文章中介绍。 (向 David Smith 致敬在 blog.revolution-computing.com 上)您还可以阅读 Hadley 的相关文档在 ggplot2 上。

这是示例代码:

library(ggplot2)
data(mpg)

#create a data frame with averages and standard deviations
 hwy.avg<-ddply(mpg, c("class", "year"), function(df)
 return(c(hwy.avg=mean(df$hwy), hwy.sd=sd(df$hwy))))

#create the barplot component
 avg.plot<-qplot(class, hwy.avg, fill=factor(year), data=hwy.avg, geom="bar", position="dodge")

#first, define the width of the dodge
dodge <- position_dodge(width=0.9)

#now add the error bars to the plot
avg.plot+geom_linerange(aes(ymax=hwy.avg+hwy.sd, ymin=hwy.avg-hwy.sd), position=dodge)+theme_bw()

它最终看起来像这样:
替代文本

The first plot was just covered in a blog post on imachordata.com. (hat tip to David Smith on blog.revolution-computing.com) You can also read the related documentation from Hadley on ggplot2.

Here's the example code:

library(ggplot2)
data(mpg)

#create a data frame with averages and standard deviations
 hwy.avg<-ddply(mpg, c("class", "year"), function(df)
 return(c(hwy.avg=mean(df$hwy), hwy.sd=sd(df$hwy))))

#create the barplot component
 avg.plot<-qplot(class, hwy.avg, fill=factor(year), data=hwy.avg, geom="bar", position="dodge")

#first, define the width of the dodge
dodge <- position_dodge(width=0.9)

#now add the error bars to the plot
avg.plot+geom_linerange(aes(ymax=hwy.avg+hwy.sd, ymin=hwy.avg-hwy.sd), position=dodge)+theme_bw()

It ends up looking like this:
alt text

梦忆晨望 2024-08-12 10:59:40

这个问题已经有近 2 年的历史了,但是作为实验领域的新 R 用户,这对我来说是一个大问题,并且这个页面在 google 结果中很突出。我刚刚发现了一个比当前集合更喜欢的答案,所以我想我会添加它。

sciplot 包使任务变得非常简单。它通过一个命令完成工作,

#only necessary to get the MPG dataset from ggplot for direct comparison
library(ggplot2)
data(mpg)
attach(mpg)

#the bargraph.CI function with a couple of parameters to match the ggplot example
#see also lineplot.CI in the same package
library(sciplot)
bargraph.CI(
  class,  #categorical factor for the x-axis
  hwy,    #numerical DV for the y-axis
  year,   #grouping factor
  legend=T, 
  x.leg=19,
  ylab="Highway MPG",
  xlab="Class")

生成这个非常可行的图表,其中大部分是默认选项。请注意,默认情况下,误差线是标准误差,但参数采用函数,因此它们可以是您想要的任何值! sciplot bargraph.CI with mpg data

This question is almost 2 years old now, but as a new R user in an experimental field, this was a big question for me, and this page is prominent in google results. I just discovered an answer I like better than the current set, so I thought I'd add it.

the package sciplot makes the task super easy. It gets the job done in a single command

#only necessary to get the MPG dataset from ggplot for direct comparison
library(ggplot2)
data(mpg)
attach(mpg)

#the bargraph.CI function with a couple of parameters to match the ggplot example
#see also lineplot.CI in the same package
library(sciplot)
bargraph.CI(
  class,  #categorical factor for the x-axis
  hwy,    #numerical DV for the y-axis
  year,   #grouping factor
  legend=T, 
  x.leg=19,
  ylab="Highway MPG",
  xlab="Class")

produces this very workable graph with mostly default options. Note that the error bars are standard errors by default, but the parameter takes a function, so they can be anything you want! sciplot bargraph.CI with mpg data

凉宸 2024-08-12 10:59:40

虽然来得有点晚,但这个解决方案可能对未来的用户有用。它使用加载 R 的 diamond data.frame,并利用 stat_summary 以及两个(超短)自定义函数。

require(ggplot2)

# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}

# create a ggplot
ggplot(diamonds,aes(cut,price,fill=color))+
  # first layer is barplot with means
  stat_summary(fun.y=mean, geom="bar", position="dodge", colour='white')+
  # second layer overlays the error bars using the functions defined above
  stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, geom="errorbar", position="dodge",color = 'black', size=.5)

输入图片此处描述

Coming a little late to the game, but this solution might be useful for future users. It uses the diamond data.frame loaded with R and takes advantage of stat_summary along with two (super short) custom functions.

require(ggplot2)

# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}

# create a ggplot
ggplot(diamonds,aes(cut,price,fill=color))+
  # first layer is barplot with means
  stat_summary(fun.y=mean, geom="bar", position="dodge", colour='white')+
  # second layer overlays the error bars using the functions defined above
  stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, geom="errorbar", position="dodge",color = 'black', size=.5)

enter image description here

不疑不惑不回忆 2024-08-12 10:59:40

使用 ggplot2 可以轻松自动计算平均值及其标准误差。我建议使用默认点范围,而不是炸药图。您可能必须手动提供该位置。操作方法如下:

ggplot(mtcars, aes(factor(cyl), hp, color = factor(am))) +
  stat_summary(position = position_dodge(0.5))

在此处输入图像描述

Means and their standard errors are easily automatically computed using ggplot2. I would recommend using the default pointranges, instead of dynamite plots. You might have to provide the position manually. Here is how:

ggplot(mtcars, aes(factor(cyl), hp, color = factor(am))) +
  stat_summary(position = position_dodge(0.5))

enter image description here

忆梦 2024-08-12 10:59:40

ggplot 生成美观的图表,但我还没有勇气尝试发布任何 ggplot 输出。

直到那一天到来之前,这就是我制作上述图表的方法。我使用名为“gplots”的图形包来获取标准误差线(使用我已经计算过的数据)。请注意,此代码为每个类别/类别提供了两个或多个因素。这需要将数据作为矩阵输入,并使用“barplot2”函数中的“beside=TRUE”命令来防止条形图堆叠。

# Create the data (means) matrix
# Using the matrix accommodates two or more factors for each class

data.m <- matrix(c(75,34,19, 39,90,41), nrow = 2, ncol=3, byrow=TRUE,
               dimnames = list(c("Factor 1", "Factor 2"),
                                c("Class A", "Class B", "Class C")))

# Create the standard error matrix

error.m <- matrix(c(12,10,7, 4,7,3), nrow = 2, ncol = 3, byrow=TRUE)

# Join the data and s.e. matrices into a data frame

data.fr <- data.frame(data.m, error.m) 

# load library {gplots}

library(gplots)

# Plot the bar graph, with standard errors

with(data.fr,
     barplot2(data.m, beside=TRUE, axes=T, las=1, ylim = c(0,120),  
                main=" ", sub=" ", col=c("gray20",0),
                    xlab="Class", ylab="Total amount (Mean +/- s.e.)",
                plot.ci=TRUE, ci.u=data.m+error.m, ci.l=data.m-error.m, ci.lty=1))

# Now, give it a legend:

legend("topright", c("Factor 1", "Factor 2"), fill=c("gray20",0),box.lty=0)

从审美上来说,它相当简单——简,但似乎是大多数期刊/老教授想要看到的。

我将发布这些示例数据生成的图表,但这是我在该网站上的第一篇文章。对不起。人们应该能够毫无问题地复制粘贴整个内容(在安装“gplots”包之后)。

ggplot produces aesthetically pleasing graphs, but I don't have the gumption to try and publish any ggplot output yet.

Until the day comes, here is how I have been making the aforementioned graphs. I use a graphics package called 'gplots' in order to get the standard error bars (using data I've calculated already). Note that this code provides for two or more factors for each class/category. This requires the data to go in as a matrix and for the "beside=TRUE" command in the "barplot2" function to keep the bars from being stacked.

# Create the data (means) matrix
# Using the matrix accommodates two or more factors for each class

data.m <- matrix(c(75,34,19, 39,90,41), nrow = 2, ncol=3, byrow=TRUE,
               dimnames = list(c("Factor 1", "Factor 2"),
                                c("Class A", "Class B", "Class C")))

# Create the standard error matrix

error.m <- matrix(c(12,10,7, 4,7,3), nrow = 2, ncol = 3, byrow=TRUE)

# Join the data and s.e. matrices into a data frame

data.fr <- data.frame(data.m, error.m) 

# load library {gplots}

library(gplots)

# Plot the bar graph, with standard errors

with(data.fr,
     barplot2(data.m, beside=TRUE, axes=T, las=1, ylim = c(0,120),  
                main=" ", sub=" ", col=c("gray20",0),
                    xlab="Class", ylab="Total amount (Mean +/- s.e.)",
                plot.ci=TRUE, ci.u=data.m+error.m, ci.l=data.m-error.m, ci.lty=1))

# Now, give it a legend:

legend("topright", c("Factor 1", "Factor 2"), fill=c("gray20",0),box.lty=0)

It is pretty plain-Jane, aesthetically, but seems to be what most journals/old professors want to see.

I'd post the graph produced by these example data, but this is my first post on the site. Sorry. One should be able to copy-paste the whole thing (after installing the "gplots" package) without problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文