如何将按组绘图元素叠加到 ggplot2 方面?
我的问题与分面有关。在下面的示例代码中,我查看了一些分面散点图,然后尝试在每个方面覆盖信息(在本例中为平均线)。
tl;dr 版本是我的尝试失败了。要么我添加的平均线计算所有数据(不尊重方面变量),要么我尝试编写一个公式,但 R 抛出错误,然后是对我母亲的尖锐且特别贬低的评论。
library(ggplot2)
# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)
# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)
# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)
# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)
# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
aes(yintercept = mean(hp)),
data = mtcars
)
print(p.with.means)
# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
aes(yintercept = mean(hp) ~ cyl),
data = mtcars
)
print(p.with.non.lake.wobegon.means)
一定有一些我缺少的简单解决方案。
My question has to do with facetting. In my example code below, I look at some facetted scatterplots, then try to overlay information (in this case, mean lines) on a per-facet basis.
The tl;dr version is that my attempts fail. Either my added mean lines compute across all data (disrespecting the facet variable), or I try to write a formula and R throws an error, followed by incisive and particularly disparaging comments about my mother.
library(ggplot2)
# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)
# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)
# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)
# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)
# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
aes(yintercept = mean(hp)),
data = mtcars
)
print(p.with.means)
# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
aes(yintercept = mean(hp) ~ cyl),
data = mtcars
)
print(p.with.non.lake.wobegon.means)
There must be some simple solution I'm missing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的意思是这样的:
也许可以使用 stat_* 在 ggplot 调用中执行此操作,但我必须回去修改一下。但一般来说,如果我将摘要添加到多面图中,我会单独计算摘要,然后将它们添加到自己的
geom
中。编辑
只是对您最初尝试的一些扩展注释。一般来说,最好将
aes
调用放入ggplot
中,该调用将在整个绘图中持续存在,然后在这些geom
中指定不同的数据集或美学与“基本”情节不同。那么您就不需要在每个geom
中不断指定data = ...
。最后,我想出了一种巧妙地使用 geom_smooth 来完成与您的要求类似的操作:
水平线(即常数回归方程)只会扩展到每个方面的数据限制,但它跳过单独的数据汇总步骤。
You mean something like this:
It might be possible to do this within the
ggplot
call usingstat_*
, but I'd have to go back and tinker a bit. But generally if I'm adding summaries to a faceted plot I calculate the summaries separately and then add them with their owngeom
.EDIT
Just a few expanded notes on your original attempt. Generally it's a good idea to put
aes
calls inggplot
that will persist throughout the plot, and then specify different data sets or aesthetics in thosegeom
's that differ from the 'base' plot. Then you don't need to keep specifyingdata = ...
in eachgeom
.Finally, I came up with a kind of clever use of
geom_smooth
to do something similar to what your asking:The horizontal line (i.e. constant regression eqn) will only extend to the limits of the data in each facet, but it skips the separate data summary step.