将汇总统计数据(甚至原始数据点)添加到躲避位置箱线图中

发布于 2024-08-06 03:49:32 字数 779 浏览 2 评论 0原文

假设您有以下数据集:

trt <- ifelse(runif(100)<0.5,"drug","placebo")
inj.site <- ifelse(runif(100)<0.5,"ankle","wrist")
relief <- 20 + 0.5*(inj.site=="ankle") + 0.5*(trt=="drug") + rnorm(100)
to.analyze <- data.frame(trt,inj.site,relief)

现在,我们的想法是制作一个箱线图,其中 x 轴上有损伤部位,并并排有治疗箱:

bplot <- ggplot(to.analyze,aes(inj.site,relief,fill=trt)) + geom_boxplot(position="dodge")

很简单。但现在我想在框的顶部添加原始数据点。如果我没有带有 position="dodge" 的框,这会很容易:

bplot + geom_point(aes(colour=trt))

但是,这会在框之间绘制点,并添加 position="dodge" >对于这个几何形状似乎不起作用。我如何调整它以便在方框上绘制点?

奖励:与使用 stat_summary(blah,y.fun=mean,shape="+") 过度绘制均值的情况相同,也有同样的问题。

Say you have the following dataset:

trt <- ifelse(runif(100)<0.5,"drug","placebo")
inj.site <- ifelse(runif(100)<0.5,"ankle","wrist")
relief <- 20 + 0.5*(inj.site=="ankle") + 0.5*(trt=="drug") + rnorm(100)
to.analyze <- data.frame(trt,inj.site,relief)

Now, the idea is to make a boxplot with injury site on the x-axis and boxes by treatment side-by-side:

bplot <- ggplot(to.analyze,aes(inj.site,relief,fill=trt)) + geom_boxplot(position="dodge")

Easy enough. But now I want to add raw data points on top of the boxes. If I didn't have boxes with position="dodge", this would be easy:

bplot + geom_point(aes(colour=trt))

However, this draws points in between the boxes, and adding a position="dodge"to this geometry does not seem to work. How do I adjust this so that points are drawn over the boxes?

Bonus: same situation with using stat_summary(blah,y.fun=mean,shape="+") to overplot the means, which has the same issue.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不必了 2024-08-13 03:49:32

如果我在这里错了,哈德利无疑会纠正我......

这是自然语法:

bplot + geom_point(aes(colour=trt), position=position_dodge(width=.5))

(position =“dodge”将做同样的事情,没有参数。)

当我绘制它时,我得到一些看起来像position_jitter( ),这大概也是你得到的。

出于好奇,我查看了源代码,在那里我找到了 pos_dodge() 函数。 (在 R 提示符下输入 pos_dodge 即可查看...)到此结束:

within(df, {
  xmin <- xmin + width / n * (seq_len(n) - 1) - diff * (n - 1) / (2 * n)
  xmax <- xmin + d_width / n
  x <- (xmin + xmax) / 2
})

n 是数据框的行数。所以看起来它正在以行索引的分数来躲避各个点!所以第一个点是闪避的 width/n,第二个点是闪避的 2 * width/n,最后一个点是闪避的 n * width/n。

这显然不是您的意思,尽管这是您所说的。您可能会陷入手动重新创建躲避的箱线图,或使用不同的可视化(例如分面)的困境?

ggplot(to.analyze,aes(inj.site,relief)) + geom_boxplot() + facet_wrap(~ trt)

Hadley will doubtless correct me if I'm wrong here...

Here's the natural syntax:

bplot + geom_point(aes(colour=trt), position=position_dodge(width=.5))

(position="dodge" will do the same thing, without the parameter.)

When I plot it, I get something that looks like a position_jitter(), which is presumably what you get too.

Curious, I went to look in the source, where I found the pos_dodge() function. (Type pos_dodge at an R prompt to see it...) Here's the end of it:

within(df, {
  xmin <- xmin + width / n * (seq_len(n) - 1) - diff * (n - 1) / (2 * n)
  xmax <- xmin + d_width / n
  x <- (xmin + xmax) / 2
})

n is the number of rows of the data frame. So it looks like it's dodging the individual points by a fraction indexed by the row! So the first point is dodged width/n, the second is dodged 2 * width/n, and the last is dodged n * width/n.

This is obviously not what you meant, although it is what you said. You may be stuck recreating the dodged boxplot manually, or using a different visualization, like faceting maybe?

ggplot(to.analyze,aes(inj.site,relief)) + geom_boxplot() + facet_wrap(~ trt)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文