按组创建组合和总和

发布于 2025-02-12 13:48:59 字数 1067 浏览 3 评论 0原文

我在ID号中有名称数据以及许多关联的值。看起来像这样：

structure(list(id = c("a", "a", "b", "b"), name = c("bob", "jane", 
"mark", "brittney"), number = c(1L, 2L, 1L, 2L), value = c(1L, 
2L, 1L, 2L)), class = "data.frame", row.names = c(NA, -4L))

#   id     name number value
# 1  a      bob      1     1
# 2  a     jane      2     2
# 3  b     mark      1     1
# 4  b brittney      2     2

我想创建name的所有组合，无论有多少，并用逗号粘贴它们在一起，并总结其number 和value在每个ID中。然后是上面示例的所需输出：

structure(list(id = c("a", "a", "a", "b", "b", "b"), name = c("bob", 
"jane", "bob, jane", "mark", "brittney", "mark, brittney"), number = c(1L, 
2L, 3L, 1L, 2L, 3L), value = c(1L, 2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))

#   id           name number value
# 1  a            bob      1     1
# 2  a           jane      2     2
# 3  a      bob, jane      3     3
# 4  b           mark      1     1
# 5  b       brittney      2     2
# 6  b mark, brittney      3     3

谢谢所有！

原文

I have data of names within an ID number along with a number of associated values. It looks something like this:

structure(list(id = c("a", "a", "b", "b"), name = c("bob", "jane", 
"mark", "brittney"), number = c(1L, 2L, 1L, 2L), value = c(1L, 
2L, 1L, 2L)), class = "data.frame", row.names = c(NA, -4L))

#   id     name number value
# 1  a      bob      1     1
# 2  a     jane      2     2
# 3  b     mark      1     1
# 4  b brittney      2     2

I would like to create all the combinations of name, regardless of how many there are, and paste them together separated with commas, and sum their number and value within each id. The desired output from the example above is then:

structure(list(id = c("a", "a", "a", "b", "b", "b"), name = c("bob", 
"jane", "bob, jane", "mark", "brittney", "mark, brittney"), number = c(1L, 
2L, 3L, 1L, 2L, 3L), value = c(1L, 2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))

#   id           name number value
# 1  a            bob      1     1
# 2  a           jane      2     2
# 3  a      bob, jane      3     3
# 4  b           mark      1     1
# 5  b       brittney      2     2
# 6  b mark, brittney      3     3

Thanks all!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷弦 2025-02-19 13:48:59

您可以使用group_modify（） + add_row（）：

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify( ~ .x %>%
    summarise(name = toString(name), across(c(number, value), sum)) %>%
    add_row(.x, .)
  ) %>%
  ungroup()

# # A tibble: 6 × 4
#   id    name           number value
#   <chr> <chr>           <int> <int>
# 1 a     bob                 1     1
# 2 a     jane                2     2
# 3 a     bob, jane           3     3
# 4 b     mark                1     1
# 5 b     brittney            2     2
# 6 b     mark, brittney      3     3

You could use group_modify() + add_row():

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify( ~ .x %>%
    summarise(name = toString(name), across(c(number, value), sum)) %>%
    add_row(.x, .)
  ) %>%
  ungroup()

# # A tibble: 6 × 4
#   id    name           number value
#   <chr> <chr>           <int> <int>
# 1 a     bob                 1     1
# 2 a     jane                2     2
# 3 a     bob, jane           3     3
# 4 b     mark                1     1
# 5 b     brittney            2     2
# 6 b     mark, brittney      3     3

回复收藏 0 原文

萌面超妹 2025-02-19 13:48:59

您可以使用combn（）创建成对索引，并使用slice（）来扩展数据框架。然后仅通过这些行对进行分组并总结。我假设您需要成对组合，但是如果需要，它可以适用于较大的组合。一些处理组的代码＆LT;包括2，但如果这些数据不存在，则可以删除。

library(dplyr)
library(purrr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3

编辑：

为了适应所有可能的组合，您可以迭代到最大组大小的值。使用编辑的数据，该数据在第一个组中添加了几行：

map_df(seq(max(table(df2$id))), ~
         df2 %>%
         group_by(id) %>%
         slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
         mutate(id2 = (row_number() - 1) %/% .x) %>%
         group_by(id, id2) %>%
         summarise(name = toString(name),
                   across(where(is.numeric), sum), .groups = "drop")
       ) %>%
  select(-id2) %>%
  arrange(id)

# A tibble: 18 × 4
   id    name                      number value
   <chr> <chr>                      <int> <int>
 1 a     bob                            1     1
 2 a     jane                           2     2
 3 a     sophie                         1     1
 4 a     jeremy                         2     2
 5 a     bob, jane                      3     3
 6 a     bob, sophie                    2     2
 7 a     bob, jeremy                    3     3
 8 a     jane, sophie                   3     3
 9 a     jane, jeremy                   4     4
10 a     sophie, jeremy                 3     3
11 a     bob, jane, sophie              4     4
12 a     bob, jane, jeremy              5     5
13 a     bob, sophie, jeremy            4     4
14 a     jane, sophie, jeremy           5     5
15 a     bob, jane, sophie, jeremy      6     6
16 b     mark                           3     5
17 b     brittney                       4     6
18 b     mark, brittney                 7    11

df2的数据：

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
                                                                     "jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L, 
                                                                                                                                 2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                              -6L))

You can create pairwise indices using combn() and expand the data frame with these using slice(). Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.

library(dplyr)
library(purrr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3

Edit:

To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:

map_df(seq(max(table(df2$id))), ~
         df2 %>%
         group_by(id) %>%
         slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
         mutate(id2 = (row_number() - 1) %/% .x) %>%
         group_by(id, id2) %>%
         summarise(name = toString(name),
                   across(where(is.numeric), sum), .groups = "drop")
       ) %>%
  select(-id2) %>%
  arrange(id)

# A tibble: 18 × 4
   id    name                      number value
   <chr> <chr>                      <int> <int>
 1 a     bob                            1     1
 2 a     jane                           2     2
 3 a     sophie                         1     1
 4 a     jeremy                         2     2
 5 a     bob, jane                      3     3
 6 a     bob, sophie                    2     2
 7 a     bob, jeremy                    3     3
 8 a     jane, sophie                   3     3
 9 a     jane, jeremy                   4     4
10 a     sophie, jeremy                 3     3
11 a     bob, jane, sophie              4     4
12 a     bob, jane, jeremy              5     5
13 a     bob, sophie, jeremy            4     4
14 a     jane, sophie, jeremy           5     5
15 a     bob, jane, sophie, jeremy      6     6
16 b     mark                           3     5
17 b     brittney                       4     6
18 b     mark, brittney                 7    11

Data for df2:

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
                                                                     "jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L, 
                                                                                                                                 2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                              -6L))

回复收藏 0 原文

两相知 2025-02-19 13:48:59

data.table选项

setDT(df)[
  ,
  lapply(
    .SD,
    function(x) {
      unlist(
        lapply(
          seq_along(x),
          combn,
          x = x,
          function(v) {
            ifelse(all(is.character(v)), toString, sum)(v)
          }
        )
      )
    }
  ),
  id
]

给出

   id           name number value
1:  a            bob      1     1
2:  a           jane      2     2
3:  a      bob, jane      3     3
4:  b           mark      1     1
5:  b       brittney      2     2
6:  b mark, brittney      3     3

A data.table option

setDT(df)[
  ,
  lapply(
    .SD,
    function(x) {
      unlist(
        lapply(
          seq_along(x),
          combn,
          x = x,
          function(v) {
            ifelse(all(is.character(v)), toString, sum)(v)
          }
        )
      )
    }
  ),
  id
]

gives

   id           name number value
1:  a            bob      1     1
2:  a           jane      2     2
3:  a      bob, jane      3     3
4:  b           mark      1     1
5:  b       brittney      2     2
6:  b mark, brittney      3     3

回复收藏 0 原文

~没有更多了~