跨多列中DPLYR函数中的动态变量名称

发布于 2025-01-23 21:11:41 字数 1705 浏览 3 评论 0原文

我正在尝试编写一个使用dplyr :: Summarize以获取数据框架多个列的均值，并使用新的rlang胶水将动态名称分配到汇总列中，并将动态名称分配给摘要列，并将动态名称分配给了一个函数。语法和：=操作员。

这是使用mtcars数据集的简单示例。

当仅通过一列总结时 - 胶语语法有效（即汇总的列名是mean_mpg）：

mean_fun <- function(data, group_cols, summary_col) {
 data %>%
 group_by(across({{ group_cols }})) %>%
 summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)

   cyl  gear mean_mpg
  <dbl> <dbl>    <dbl>
1     4     3     21.5
2     4     4     26.9
3     4     5     28.2
4     6     3     19.8
5     6     4     19.8
6     6     5     19.7
7     8     3     15.0
8     8     5     15.4

但是当在多个列中汇总时，等效词不正确地命名cols：

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{ group_cols }})) %>%
    summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))

    cyl  gear `mean_c(mpg, wt)`$mpg   $wt
  <dbl> <dbl>                 <dbl> <dbl>
1     4     3                  21.5  2.46
2     4     4                  26.9  2.38
3     4     5                  28.2  1.83
4     6     3                  19.8  3.34
5     6     4                  19.8  3.09
6     6     5                  19.7  2.77
7     8     3                  15.0  4.10
8     8     5                  15.4  3.37

我如何获得汇总的列读取的名称均值_MPG和sean_wt？为什么这不起作用？

我意识到可能还有许多其他方法可以执行此任务，但是我想知道如何获得这种方法（即使用整洁的评估，定制函数中的rlang语法）来为教学目的和我自己的理解工作！

谢谢

原文

I am trying to write a function that uses dplyr::summarise to obtain means of multiple columns of a data frame and assign dynamic names to the summarised columns using the new rlang glue syntax and := operator.

Here's a simple example of my problem using the mtcars dataset.

When summarising over just one column - the glue syntax works (i.e. the summarised column name is mean_mpg):

mean_fun <- function(data, group_cols, summary_col) {
 data %>%
 group_by(across({{ group_cols }})) %>%
 summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)

   cyl  gear mean_mpg
  <dbl> <dbl>    <dbl>
1     4     3     21.5
2     4     4     26.9
3     4     5     28.2
4     6     3     19.8
5     6     4     19.8
6     6     5     19.7
7     8     3     15.0
8     8     5     15.4

But the equivalent does not name the cols properly when summarising over multiple columns:

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{ group_cols }})) %>%
    summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))

    cyl  gear `mean_c(mpg, wt)`$mpg   $wt
  <dbl> <dbl>                 <dbl> <dbl>
1     4     3                  21.5  2.46
2     4     4                  26.9  2.38
3     4     5                  28.2  1.83
4     6     3                  19.8  3.34
5     6     4                  19.8  3.09
6     6     5                  19.7  2.77
7     8     3                  15.0  4.10
8     8     5                  15.4  3.37

How can I get the summarised column names to read mean_mpg and mean_wt? And why does this not work?

I realise that there are likely many other ways to perform this task but I would like to know how to get this method (i.e. using tidy eval, rlang syntax in a bespoke function) to work for teaching purposes and my own understanding!

Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女的英雄梦 2025-01-30 21:11:41

我们可以在中使用.names in 重命名 - 测试

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{group_cols}})) %>%
     summarise(across({{ summary_cols }},
         ~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}

注意

mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
    cyl  gear mean_mpg mean_wt
  <dbl> <dbl>    <dbl>   <dbl>
1     4     3     21.5    2.46
2     4     4     26.9    2.38
3     4     5     28.2    1.83
4     6     3     19.8    3.34
5     6     4     19.8    3.09
6     6     5     19.7    2.77
7     8     3     15.0    4.10
8     8     5     15.4    3.37

：：=主要在中有一个列时使用。 tidyverse

如果使用OP的功能，我们将多列分配给单列，这将返回tibble而不是普通列。我们可能需要解开

library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ cyl            : num [1:8] 4 4 4 6 6 6 8 8
 $ gear           : num [1:8] 3 4 5 3 4 5 3 5
 $ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
  ..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
 - attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ cyl  : num [1:3] 4 6 8
  ..$ .rows: list<int> [1:3] 
  .. ..$ : int [1:3] 1 2 3
  .. ..$ : int [1:3] 4 5 6
  .. ..$ : int [1:2] 7 8
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% 
        unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups:   cyl [3]
    cyl  gear   mpg    wt
  <dbl> <dbl> <dbl> <dbl>
1     4     3  21.5  2.46
2     4     4  26.9  2.38
3     4     5  28.2  1.83
4     6     3  19.8  3.34
5     6     4  19.8  3.09
6     6     5  19.7  2.77
7     8     3  15.0  4.10
8     8     5  15.4  3.37

We could use .names in across to rename

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{group_cols}})) %>%
     summarise(across({{ summary_cols }},
         ~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}

-testing

mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
    cyl  gear mean_mpg mean_wt
  <dbl> <dbl>    <dbl>   <dbl>
1     4     3     21.5    2.46
2     4     4     26.9    2.38
3     4     5     28.2    1.83
4     6     3     19.8    3.34
5     6     4     19.8    3.09
6     6     5     19.7    2.77
7     8     3     15.0    4.10
8     8     5     15.4    3.37

NOTE: The := is mainly used when there is a single column in tidyverse

If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble instead of a normal column. We may need to unpack

library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ cyl            : num [1:8] 4 4 4 6 6 6 8 8
 $ gear           : num [1:8] 3 4 5 3 4 5 3 5
 $ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
  ..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
 - attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ cyl  : num [1:3] 4 6 8
  ..$ .rows: list<int> [1:3] 
  .. ..$ : int [1:3] 1 2 3
  .. ..$ : int [1:3] 4 5 6
  .. ..$ : int [1:2] 7 8
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% 
        unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups:   cyl [3]
    cyl  gear   mpg    wt
  <dbl> <dbl> <dbl> <dbl>
1     4     3  21.5  2.46
2     4     4  26.9  2.38
3     4     5  28.2  1.83
4     6     3  19.8  3.34
5     6     4  19.8  3.09
6     6     5  19.7  2.77
7     8     3  15.0  4.10
8     8     5  15.4  3.37

回复收藏 0 原文

~没有更多了~