参考GGPLOT的输入数据,并将其用于GEOM中的自定义功能

发布于 2025-02-11 00:40:44 字数 1655 浏览 3 评论 0原文

我正在使用GGPLOT GEOM_VLINE与自定义函数结合使用,以在直方图顶部绘制某些值。

示例函数下面的函数返回三个值的向量(平均值和x SD在平均值以下或高于平均值)。现在,我可以在GEOM_VLINE(Xintercept)中绘制这些值,并在我的图表中查看它们。

#example function
sds_around_the_mean <- function(x, multiplier = 1) {
  mean <- mean(x, na.rm = TRUE)
  sd <- sd(x, na.rm = TRUE)
  
  tibble(low   = mean - multiplier * sd,
         mean  = mean,
         high  = mean + multiplier * sd) %>% 
    pivot_longer(cols = everything()) %>% 
    pull(value)
}

可重复的数据

    #data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))

df <- bind_rows(lst(normal, outliers), .id = "type")

df %>% 
  ggplot(aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
             linetype = "dashed")

“

问题是,如您所见,我必须在各个地方定义数据$ DF。 当我将任何更改应用于GGPLOT中的原始DF时,这将变得更容易出错,例如在绘制之前过滤离群值。我将不得不在多个地方再次应用相同的更改。

E.g.
df %>% filter(type == "normal")
#also requires 
df$data 
#to be changed to 
df$data[df$type == "normal"] 
#in geom_vline to obtain the correct input values for the xintercept.

因此,我如何首先将DF $数据参数替换为已将其管道的任何内容的列替换为GGPLOT()?类似于“”。我想操作员。我还尝试了使用GEOM =“ Vline”的Stat_summary来实现这一目标,但没有所需的效果。

I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.

The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.

#example function
sds_around_the_mean <- function(x, multiplier = 1) {
  mean <- mean(x, na.rm = TRUE)
  sd <- sd(x, na.rm = TRUE)
  
  tibble(low   = mean - multiplier * sd,
         mean  = mean,
         high  = mean + multiplier * sd) %>% 
    pivot_longer(cols = everything()) %>% 
    pull(value)
}

Reproducible data

    #data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))

df <- bind_rows(lst(normal, outliers), .id = "type")

df %>% 
  ggplot(aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
             linetype = "dashed")

example_hist

The problem is, that as you can see I would have to define data$df at various places.
This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.

E.g.
df %>% filter(type == "normal")
#also requires 
df$data 
#to be changed to 
df$data[df$type == "normal"] 
#in geom_vline to obtain the correct input values for the xintercept.

So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

屌丝范 2025-02-18 00:40:44

您可以将GGPLOT零件包装在卷曲括号中,并在GGPLOT命令中以及计算SDS_AROUND_THE_MEAN时使用。这将使它动态。

df %>% 
  {ggplot(data = ., aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
             linetype = "dashed")}

You can enclose the ggplot part in curly brackets and reference the incoming dataset with the . symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.

df %>% 
  {ggplot(data = ., aes(x = data)) + 
  geom_histogram(bins = 100) + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
             linetype = "dashed", color = "red") + 
  geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
             linetype = "dashed")}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文