如何将按组绘图元素叠加到 ggplot2 方面？

发布于 2024-11-19 22:21:29 字数 1953 浏览 1 评论 0原文

我的问题与分面有关。在下面的示例代码中，我查看了一些分面散点图，然后尝试在每个方面覆盖信息（在本例中为平均线）。

tl;dr 版本是我的尝试失败了。要么我添加的平均线计算所有数据（不尊重方面变量），要么我尝试编写一个公式，但 R 抛出错误，然后是对我母亲的尖锐且特别贬低的评论。

library(ggplot2)

# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)

# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)

# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)

# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)

# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
                      aes(yintercept = mean(hp)),
                      data = mtcars
         )
print(p.with.means)

# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
                                       aes(yintercept = mean(hp) ~ cyl),
                                       data = mtcars
                                     )
print(p.with.non.lake.wobegon.means)

一定有一些我缺少的简单解决方案。

原文

My question has to do with facetting. In my example code below, I look at some facetted scatterplots, then try to overlay information (in this case, mean lines) on a per-facet basis.

The tl;dr version is that my attempts fail. Either my added mean lines compute across all data (disrespecting the facet variable), or I try to write a formula and R throws an error, followed by incisive and particularly disparaging comments about my mother.

library(ggplot2)

# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)

# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)

# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)

# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)

# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
                      aes(yintercept = mean(hp)),
                      data = mtcars
         )
print(p.with.means)

# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
                                       aes(yintercept = mean(hp) ~ cyl),
                                       data = mtcars
                                     )
print(p.with.non.lake.wobegon.means)

There must be some simple solution I'm missing.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不乱于心 2024-11-26 22:21:29

您的意思是这样的：

rs <- ddply(mtcars,.(cyl),summarise,mn = mean(hp))

p + geom_hline(data=rs,aes(yintercept=mn))

也许可以使用 stat_* 在 ggplot 调用中执行此操作，但我必须回去修改一下。但一般来说，如果我将摘要添加到多面图中，我会单独计算摘要，然后将它们添加到自己的geom中。

编辑

只是对您最初尝试的一些扩展注释。一般来说，最好将 aes 调用放入 ggplot 中，该调用将在整个绘图中持续存在，然后在这些 geom 中指定不同的数据集或美学与“基本”情节不同。那么您就不需要在每个 geom 中不断指定 data = ...。

最后，我想出了一种巧妙地使用 geom_smooth 来完成与您的要求类似的操作：

p <- ggplot(data = mtcars,aes(x = wt, y = hp, colour = factor(cyl))) + 
    facet_grid(~cyl) + 
    geom_point() + 
    geom_smooth(se=FALSE,method="lm",formula=y~1,colour="black")

水平线（即常数回归方程）只会扩展到每个方面的数据限制，但它跳过单独的数据汇总步骤。

You mean something like this:

rs <- ddply(mtcars,.(cyl),summarise,mn = mean(hp))

p + geom_hline(data=rs,aes(yintercept=mn))

It might be possible to do this within the ggplot call using stat_*, but I'd have to go back and tinker a bit. But generally if I'm adding summaries to a faceted plot I calculate the summaries separately and then add them with their own geom.

EDIT

Just a few expanded notes on your original attempt. Generally it's a good idea to put aes calls in ggplot that will persist throughout the plot, and then specify different data sets or aesthetics in those geom's that differ from the 'base' plot. Then you don't need to keep specifying data = ... in each geom.

Finally, I came up with a kind of clever use of geom_smooth to do something similar to what your asking:

p <- ggplot(data = mtcars,aes(x = wt, y = hp, colour = factor(cyl))) + 
    facet_grid(~cyl) + 
    geom_point() + 
    geom_smooth(se=FALSE,method="lm",formula=y~1,colour="black")

The horizontal line (i.e. constant regression eqn) will only extend to the limits of the data in each facet, but it skips the separate data summary step.

回复收藏 0 原文

~没有更多了~