参考GGPLOT的输入数据,并将其用于GEOM中的自定义功能
我正在使用GGPLOT GEOM_VLINE与自定义函数结合使用,以在直方图顶部绘制某些值。
示例函数下面的函数返回三个值的向量(平均值和x SD在平均值以下或高于平均值)。现在,我可以在GEOM_VLINE(Xintercept)中绘制这些值,并在我的图表中查看它们。
#example function
sds_around_the_mean <- function(x, multiplier = 1) {
mean <- mean(x, na.rm = TRUE)
sd <- sd(x, na.rm = TRUE)
tibble(low = mean - multiplier * sd,
mean = mean,
high = mean + multiplier * sd) %>%
pivot_longer(cols = everything()) %>%
pull(value)
}
可重复的数据
#data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))
df <- bind_rows(lst(normal, outliers), .id = "type")
df %>%
ggplot(aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
linetype = "dashed")
问题是,如您所见,我必须在各个地方定义数据$ DF。 当我将任何更改应用于GGPLOT中的原始DF时,这将变得更容易出错,例如在绘制之前过滤离群值。我将不得不在多个地方再次应用相同的更改。
E.g.
df %>% filter(type == "normal")
#also requires
df$data
#to be changed to
df$data[df$type == "normal"]
#in geom_vline to obtain the correct input values for the xintercept.
因此,我如何首先将DF $数据参数替换为已将其管道的任何内容的列替换为GGPLOT()?类似于“”。我想操作员。我还尝试了使用GEOM =“ Vline”的Stat_summary来实现这一目标,但没有所需的效果。
I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.
The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.
#example function
sds_around_the_mean <- function(x, multiplier = 1) {
mean <- mean(x, na.rm = TRUE)
sd <- sd(x, na.rm = TRUE)
tibble(low = mean - multiplier * sd,
mean = mean,
high = mean + multiplier * sd) %>%
pivot_longer(cols = everything()) %>%
pull(value)
}
Reproducible data
#data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))
df <- bind_rows(lst(normal, outliers), .id = "type")
df %>%
ggplot(aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
linetype = "dashed")
The problem is, that as you can see I would have to define data$df at various places.
This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.
E.g.
df %>% filter(type == "normal")
#also requires
df$data
#to be changed to
df$data[df$type == "normal"]
#in geom_vline to obtain the correct input values for the xintercept.
So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将GGPLOT零件包装在卷曲括号中,并在GGPLOT命令中以及计算SDS_AROUND_THE_MEAN时使用
。这将使它动态。
You can enclose the ggplot part in curly brackets and reference the incoming dataset with the
.
symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.