执行r组中的自定义汇总功能
这是我第一次在这里发布一个问题,所以请让我轻松,让我知道您是否有使我的问题更清晰的提示。
我正在尝试启动一个函数,该函数将按组(“ C”,“ e”)汇总给定的列,但我如下所示,但是当我通过时,输出似乎忽略了分组因子函数中的参数(DF,X)。如何确保在应用自定义摘要函数时尊重分组?
#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)
我的数据看起来像这样,为了简单起见,我只包含了1个感兴趣的列(FM_BDC3):
> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73",
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56",
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26",
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),
fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976,
11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436,
28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932,
19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787,
20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608,
16.992853675086, 14.138189878472, 17.777235203826)
→函数:
summbygrp <- function(df, x) {
group_by(df, group) %>%
summarise(
count = n(),
mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
}
→功能输出:
> summbygrp(dexadf, fm_bdc3)
# A tibble: 2 × 7
group count mean sd se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 c 11 20.6 5.48 1.65 16.9 24.3
2 e 11 20.6 5.48 1.65 16.9 24.3
您可以看到,两组的摘要都是相同的,我知道这不是是真的。有人可以识别我的代码中的错误吗?
如果我不使用函数,这是输出,但是我有很多列,因此为每列创建
group_by(dexadf, group) %>%
summarise(
count = n(),
mean = mean(fm_bdc3, na.rm = TRUE),
sd = sd(fm_bdc3, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
→正确的Ouput:
# A tibble: 2 × 7
group count mean sd se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 c 11 19.3 5.49 1.66 15.6 23.0
2 e 11 21.9 5.40 1.63 18.2 25.5
this is my first time posting a question here so be easy on me, let me know if you have tips for making my questions clearer.
I'm trying to initiate a function that will summarize given columns by group ("c", "e"), which I've initialized as shown below, but the output seems to ignore the grouping factor when I pass the parameters into the function (df, x). How can I ensure that grouping is respected when applying the custom summary function?
#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)
My data looks like this, I've only included 1 of the columns of interest (fm_bdc3) for sake of simplicity:
> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73",
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56",
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26",
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),
fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976,
11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436,
28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932,
19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787,
20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608,
16.992853675086, 14.138189878472, 17.777235203826)
→ function:
summbygrp <- function(df, x) {
group_by(df, group) %>%
summarise(
count = n(),
mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
}
→ function output:
> summbygrp(dexadf, fm_bdc3)
# A tibble: 2 × 7
group count mean sd se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 c 11 20.6 5.48 1.65 16.9 24.3
2 e 11 20.6 5.48 1.65 16.9 24.3
As you can see, the summaries of both groups are identical, and I know this not to be true. Can someone identify the error in my code?
Here is the output if I don't use a function, but I have many columns so this would be pretty tedious to create for each column
group_by(dexadf, group) %>%
summarise(
count = n(),
mean = mean(fm_bdc3, na.rm = TRUE),
sd = sd(fm_bdc3, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
→ correct ouput:
# A tibble: 2 × 7
group count mean sd se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 c 11 19.3 5.49 1.66 15.6 23.0
2 e 11 21.9 5.40 1.63 18.2 25.5
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由
strong>为什么这起作用
您实际上需要使用
{{}}
,发音为curly-curly,从rlang
软件包,可以使此功能起作用。当您要将varibales(即数据集的列)作为函数参数传递时,该功能参数使用dplyr
或其他tidyverse
动词(例如突变,summarize,group_by等) ,您需要像我们在这里使用x一样包裹这些参数。否则,该功能将无法按预期工作,并且很可能会投掷erros。因为tidyverse
动词使用NSE(非标准评估)。要了解更多信息,请查看此用dplyr 进行编程,我也鼓励您阅读这本书的第17-20章高级rCreated on 2022-07-09 by the reprex package (v2.0.1)
Why this works
You actually need to use
{{}}
, pronounced as curly-curly, fromrlang
package to make this function work. When you want to pass varibales (i.e. columns of a dataset) as function parameters inside a function which usesdplyr
or othertidyverse
verbs (like mutate, summarise, group_by etc.), you need to wrap those parameter inside curly-curly like here we did with x. Otherwise the function won't work as intended and most probably throw erros. Becausetidyverse
verbs uses NSE (Non-Standard Evaluation). To know more, check out this Programming with dplyr and also I would encourage you to read chapters 17-20 of the book Advanced R