参考GGPLOT的输入数据，并将其用于GEOM中的自定义功能

发布于 2025-02-11 00:40:44 字数 1655 浏览 3 评论 0原文

我正在使用GGPLOT GEOM_VLINE与自定义函数结合使用，以在直方图顶部绘制某些值。

示例函数下面的函数返回三个值的向量（平均值和x SD在平均值以下或高于平均值）。现在，我可以在GEOM_VLINE（Xintercept）中绘制这些值，并在我的图表中查看它们。

#example function
sds_around_the_mean <- function(x, multiplier = 1) {
  mean <- mean(x, na.rm = TRUE)
  sd <- sd(x, na.rm = TRUE)
  
  tibble(low   = mean - multiplier * sd,
         mean  = mean,
         high  = mean + multiplier * sd) %>% 
    pivot_longer(cols = everything()) %>% 
    pull(value)
}

可重复的数据

    #data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))

df <- bind_rows(lst(normal, outliers), .id = "type")

df %>% 
  ggplot(aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
             linetype = "dashed")

问题是，如您所见，我必须在各个地方定义数据$ DF。当我将任何更改应用于GGPLOT中的原始DF时，这将变得更容易出错，例如在绘制之前过滤离群值。我将不得不在多个地方再次应用相同的更改。

E.g.
df %>% filter(type == "normal")
#also requires 
df$data 
#to be changed to 
df$data[df$type == "normal"] 
#in geom_vline to obtain the correct input values for the xintercept.

因此，我如何首先将DF $数据参数替换为已将其管道的任何内容的列替换为GGPLOT（）？类似于“”。我想操作员。我还尝试了使用GEOM =“ Vline”的Stat_summary来实现这一目标，但没有所需的效果。

原文

I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.

The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.

#example function
sds_around_the_mean <- function(x, multiplier = 1) {
  mean <- mean(x, na.rm = TRUE)
  sd <- sd(x, na.rm = TRUE)
  
  tibble(low   = mean - multiplier * sd,
         mean  = mean,
         high  = mean + multiplier * sd) %>% 
    pivot_longer(cols = everything()) %>% 
    pull(value)
}

Reproducible data

    #data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))

df <- bind_rows(lst(normal, outliers), .id = "type")

df %>% 
  ggplot(aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
             linetype = "dashed")

The problem is, that as you can see I would have to define data$df at various places.
This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.

E.g.
df %>% filter(type == "normal")
#also requires 
df$data 
#to be changed to 
df$data[df$type == "normal"] 
#in geom_vline to obtain the correct input values for the xintercept.

So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

屌丝范 2025-02-18 00:40:44

您可以将GGPLOT零件包装在卷曲括号中，并在GGPLOT命令中以及计算SDS_AROUND_THE_MEAN时使用。这将使它动态。

df %>% 
  {ggplot(data = ., aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
             linetype = "dashed")}

You can enclose the ggplot part in curly brackets and reference the incoming dataset with the . symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.

df %>% 
  {ggplot(data = ., aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
             linetype = "dashed")}

回复收藏 0 原文

~没有更多了~