执行r组中的自定义汇总功能

发布于 2025-02-14 01:58:42 字数 2809 浏览 3 评论 0原文

这是我第一次在这里发布一个问题，所以请让我轻松，让我知道您是否有使我的问题更清晰的提示。

我正在尝试启动一个函数，该函数将按组（“ C”，“ e”）汇总给定的列，但我如下所示，但是当我通过时，输出似乎忽略了分组因子函数中的参数（DF，X）。如何确保在应用自定义摘要函数时尊重分组？

#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
                       labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)

我的数据看起来像这样，为了简单起见，我只包含了1个感兴趣的列（FM_BDC3）：

> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73", 
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56", 
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26", 
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),  
    fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976, 
    11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436, 
    28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932, 
    19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787, 
    20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608, 
    16.992853675086, 14.138189878472, 17.777235203826)

→函数：

summbygrp <- function(df, x) {
        group_by(df, group) %>%
            summarise(
              count = n(),
              mean = mean(x, na.rm = TRUE),
              sd = sd(x, na.rm = TRUE)
            ) %>%
            mutate(se = sd / sqrt(11),
                   lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
                   upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
                  )
      }

→功能输出：

> summbygrp(dexadf, fm_bdc3) 
# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  20.6  5.48  1.65     16.9     24.3
2 e        11  20.6  5.48  1.65     16.9     24.3

您可以看到，两组的摘要都是相同的，我知道这不是是真的。有人可以识别我的代码中的错误吗？

如果我不使用函数，这是输出，但是我有很多列，因此为每列创建

group_by(dexadf, group) %>%
    summarise(
      count = n(),
      mean = mean(fm_bdc3, na.rm = TRUE),
      sd = sd(fm_bdc3, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )

→正确的Ouput：

# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  19.3  5.49  1.66     15.6     23.0
2 e        11  21.9  5.40  1.63     18.2     25.5

原文

this is my first time posting a question here so be easy on me, let me know if you have tips for making my questions clearer.

I'm trying to initiate a function that will summarize given columns by group ("c", "e"), which I've initialized as shown below, but the output seems to ignore the grouping factor when I pass the parameters into the function (df, x). How can I ensure that grouping is respected when applying the custom summary function?

#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
                       labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)

My data looks like this, I've only included 1 of the columns of interest (fm_bdc3) for sake of simplicity:

> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73", 
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56", 
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26", 
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),  
    fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976, 
    11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436, 
    28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932, 
    19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787, 
    20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608, 
    16.992853675086, 14.138189878472, 17.777235203826)

→ function:

summbygrp <- function(df, x) {
        group_by(df, group) %>%
            summarise(
              count = n(),
              mean = mean(x, na.rm = TRUE),
              sd = sd(x, na.rm = TRUE)
            ) %>%
            mutate(se = sd / sqrt(11),
                   lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
                   upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
                  )
      }

→ function output:

> summbygrp(dexadf, fm_bdc3) 
# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  20.6  5.48  1.65     16.9     24.3
2 e        11  20.6  5.48  1.65     16.9     24.3

As you can see, the summaries of both groups are identical, and I know this not to be true. Can someone identify the error in my code?

Here is the output if I don't use a function, but I have many columns so this would be pretty tedious to create for each column

group_by(dexadf, group) %>%
    summarise(
      count = n(),
      mean = mean(fm_bdc3, na.rm = TRUE),
      sd = sd(fm_bdc3, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )

→ correct ouput:

# A tibble: 2 × 7
  group count  mean    sd    se lower.ci upper.ci
  <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  19.3  5.49  1.66     15.6     23.0
2 e        11  21.9  5.40  1.63     18.2     25.5

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

跨年 2025-02-21 01:58:42

library(dplyr)
library(rlang)


dexadf <- data.frame(
  stringsAsFactors = FALSE,
  participant = c("pt04","pt75","pt21","pt73",
                  "pt27","pt39","pt43","pt52","pt69","pt49","pt50",
                  "pt56","pt62","pt68","pt22","pt64","pt54","pt79",
                  "pt36","pt26","pt65","pt38"),
  fm_bdc3 = c(18.535199635968,23.52996574649,
              17.276246451976,11.526088555461,23.805048656112,
              23.08597823716,28.691020942436,28.968097858499,
              23.378093165331,22.491725344661,14.609015054932,19.734914019306,
              31.947412973684,25.152298171274,12.007356801787,
              20.836128108938,22.322230884349,14.777652101515,
              21.389572717608,16.992853675086,14.138189878472,17.777235203826),
  group = as.factor(c("c","e",
                      "e","c","c","e","c","e","c","e","e","c",
                      "e","c","c","e","e","c","e","c","e",
                      "c")),
  sex = as.factor(c("f","m",
                    "m","m","m","m","m","f","m","f","f","f",
                    "f","f","f","f","m","f","m","m","f",
                    "m"))
)


summbygrp <- function(df, x) {
  group_by(df, group) %>%
    summarise(
      count = n(),
      mean = mean({{x}}, na.rm = TRUE),
      sd = sd({{x}}, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )
}

summbygrp(dexadf, fm_bdc3)

#> # A tibble: 2 × 7
#>   group count  mean    sd    se lower.ci upper.ci
#>   <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 c        11  19.3  5.49  1.66     15.6     23.0
#> 2 e        11  21.9  5.40  1.63     18.2     25.5

由

^{strong>为什么这起作用}

您实际上需要使用{{}}，发音为curly-curly，从rlang软件包，可以使此功能起作用。当您要将varibales（即数据集的列）作为函数参数传递时，该功能参数使用dplyr或其他tidyverse动词（例如突变，summarize，group_by等），您需要像我们在这里使用x一样包裹这些参数。否则，该功能将无法按预期工作，并且很可能会投掷erros。因为tidyverse动词使用NSE（非标准评估）。要了解更多信息，请查看此用dplyr 进行编程，我也鼓励您阅读这本书的第17-20章高级r

library(dplyr)
library(rlang)


dexadf <- data.frame(
  stringsAsFactors = FALSE,
  participant = c("pt04","pt75","pt21","pt73",
                  "pt27","pt39","pt43","pt52","pt69","pt49","pt50",
                  "pt56","pt62","pt68","pt22","pt64","pt54","pt79",
                  "pt36","pt26","pt65","pt38"),
  fm_bdc3 = c(18.535199635968,23.52996574649,
              17.276246451976,11.526088555461,23.805048656112,
              23.08597823716,28.691020942436,28.968097858499,
              23.378093165331,22.491725344661,14.609015054932,19.734914019306,
              31.947412973684,25.152298171274,12.007356801787,
              20.836128108938,22.322230884349,14.777652101515,
              21.389572717608,16.992853675086,14.138189878472,17.777235203826),
  group = as.factor(c("c","e",
                      "e","c","c","e","c","e","c","e","e","c",
                      "e","c","c","e","e","c","e","c","e",
                      "c")),
  sex = as.factor(c("f","m",
                    "m","m","m","m","m","f","m","f","f","f",
                    "f","f","f","f","m","f","m","m","f",
                    "m"))
)


summbygrp <- function(df, x) {
  group_by(df, group) %>%
    summarise(
      count = n(),
      mean = mean({{x}}, na.rm = TRUE),
      sd = sd({{x}}, na.rm = TRUE)
    ) %>%
    mutate(se = sd / sqrt(11),
           lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
           upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
    )
}

summbygrp(dexadf, fm_bdc3)

#> # A tibble: 2 × 7
#>   group count  mean    sd    se lower.ci upper.ci
#>   <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 c        11  19.3  5.49  1.66     15.6     23.0
#> 2 e        11  21.9  5.40  1.63     18.2     25.5

^{Created on 2022-07-09 by the reprex package (v2.0.1)}

Why this works

You actually need to use {{}}, pronounced as curly-curly, from rlang package to make this function work. When you want to pass varibales (i.e. columns of a dataset) as function parameters inside a function which uses dplyr or other tidyverse verbs (like mutate, summarise, group_by etc.), you need to wrap those parameter inside curly-curly like here we did with x. Otherwise the function won't work as intended and most probably throw erros. Because tidyverse verbs uses NSE (Non-Standard Evaluation). To know more, check out this Programming with dplyr and also I would encourage you to read chapters 17-20 of the book Advanced R

回复收藏 0 原文

~没有更多了~