跨多列中DPLYR函数中的动态变量名称
我正在尝试编写一个使用dplyr :: Summarize
以获取数据框架多个列的均值,并使用新的rlang
胶水将动态名称分配到汇总列中,并将动态名称分配给摘要列,并将动态名称分配给了一个函数。语法和:=
操作员。
这是使用mtcars
数据集的简单示例。
当仅通过一列总结时 - 胶语语法有效(即汇总的列名是mean_mpg
):
mean_fun <- function(data, group_cols, summary_col) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)
cyl gear mean_mpg
<dbl> <dbl> <dbl>
1 4 3 21.5
2 4 4 26.9
3 4 5 28.2
4 6 3 19.8
5 6 4 19.8
6 6 5 19.7
7 8 3 15.0
8 8 5 15.4
但是当在多个列中汇总时,等效词不正确地命名cols:
mean_fun_multicols <- function(data, group_cols, summary_cols) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
cyl gear `mean_c(mpg, wt)`$mpg $wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37
我如何获得汇总的列读取的名称均值_MPG
和sean_wt
?为什么这不起作用?
我意识到可能还有许多其他方法可以执行此任务,但是我想知道如何获得这种方法(即使用整洁的评估,定制函数中的rlang语法)来为教学目的和我自己的理解工作!
谢谢
I am trying to write a function that uses dplyr::summarise
to obtain means of multiple columns of a data frame and assign dynamic names to the summarised columns using the new rlang
glue syntax and :=
operator.
Here's a simple example of my problem using the mtcars
dataset.
When summarising over just one column - the glue syntax works (i.e. the summarised column name is mean_mpg
):
mean_fun <- function(data, group_cols, summary_col) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)
cyl gear mean_mpg
<dbl> <dbl> <dbl>
1 4 3 21.5
2 4 4 26.9
3 4 5 28.2
4 6 3 19.8
5 6 4 19.8
6 6 5 19.7
7 8 3 15.0
8 8 5 15.4
But the equivalent does not name the cols properly when summarising over multiple columns:
mean_fun_multicols <- function(data, group_cols, summary_cols) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
cyl gear `mean_c(mpg, wt)`$mpg $wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37
How can I get the summarised column names to read mean_mpg
and mean_wt
? And why does this not work?
I realise that there are likely many other ways to perform this task but I would like to know how to get this method (i.e. using tidy eval, rlang syntax in a bespoke function) to work for teaching purposes and my own understanding!
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们可以在中使用
.names
in 重命名 - 测试注意
:
:=
主要在中有一个列时使用。 tidyverse
如果使用OP的功能,我们将多列分配给单列,这将返回
tibble
而不是普通列。我们可能需要解开
We could use
.names
inacross
to rename-testing
NOTE: The
:=
is mainly used when there is a single column intidyverse
If we use the OP's function, we are assigning multiple columns to a single column and this returns a
tibble
instead of a normal column. We may need tounpack