如何在 dplyr 和 R 中汇总和子集多级分组数据帧

发布于 2025-01-17 07:03:34 字数 753 浏览 3 评论 0原文

我有以下长格式数据:

testdf <- tibble(
          name = c(rep("john", 4), rep("joe", 2)), 
          rep = c(1, 1, 2, 2, 1, 1), 
          field = rep(c("pet", "age"), 3), 
          value = c("dog", "young", "cat", "old", "fish", "young")
)

对于每个指定的人(约翰和乔),我想总结他们的每只宠物:
由于某种原因,我似乎无法处理“约翰”数据中的重复事件/宠物。
如果我只过滤乔(只有一只宠物),则代码有效。

任何帮助非常感谢...

testdf %>%
          group_by(name, rep) %>%
        #  filter(name == "joe") %>%  # when I filter only for Joe, the code works
          summarise(
                    about = paste0(
                              "The pet is a: ", .[field == "pet", "value"], " and it is ", .[field == "age", "value"]
                    )
          )

I have the following data in long format:

testdf <- tibble(
          name = c(rep("john", 4), rep("joe", 2)), 
          rep = c(1, 1, 2, 2, 1, 1), 
          field = rep(c("pet", "age"), 3), 
          value = c("dog", "young", "cat", "old", "fish", "young")
)

For each named person (John and Joe), I want to summarise EACH of their pets:
For some reason I can't seem to deal with the repeating events/pets in "John" data.
If I filter just for Joe (only has one pet), the code works.

Any help much appreciated...

testdf %>%
          group_by(name, rep) %>%
        #  filter(name == "joe") %>%  # when I filter only for Joe, the code works
          summarise(
                    about = paste0(
                              "The pet is a: ", .[field == "pet", "value"], " and it is ", .[field == "age", "value"]
                    )
          )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱格式化 2025-01-24 07:03:34
testdf %>%
  pivot_wider(id_cols = name:rep,names_from = field) %>% 
  mutate(about = paste0("The pet is a: ", pet, " and it is ", age))

  name    rep pet   age   about                             
  <chr> <dbl> <chr> <chr> <chr>                             
1 john      1 dog   young The pet is a: dog and it is young 
2 john      2 cat   old   The pet is a: cat and it is old   
3 joe       1 fish  young The pet is a: fish and it is young

这也可以通过 data.table 来完成,如下所示:

library(data.table)

setDT(testdf)[
  ,j = .(about = paste0("The pet is a ", .SD[field=="pet",value], " and it is ", .SD[field=="age",value])),
  by = .(name,rep)
]

   name rep                             about
1: john   1  The pet is a dog and it is young
2: john   2    The pet is a cat and it is old
3:  joe   1 The pet is a fish and it is young
testdf %>%
  pivot_wider(id_cols = name:rep,names_from = field) %>% 
  mutate(about = paste0("The pet is a: ", pet, " and it is ", age))

  name    rep pet   age   about                             
  <chr> <dbl> <chr> <chr> <chr>                             
1 john      1 dog   young The pet is a: dog and it is young 
2 john      2 cat   old   The pet is a: cat and it is old   
3 joe       1 fish  young The pet is a: fish and it is young

This can also be done with data.table, as follows:

library(data.table)

setDT(testdf)[
  ,j = .(about = paste0("The pet is a ", .SD[field=="pet",value], " and it is ", .SD[field=="age",value])),
  by = .(name,rep)
]

   name rep                             about
1: john   1  The pet is a dog and it is young
2: john   2    The pet is a cat and it is old
3:  joe   1 The pet is a fish and it is young
向日葵 2025-01-24 07:03:34

您的数据格式较长且不整齐,其中包含多个字段。因此,郎唐回答的就是将其扩展或转向更广泛。 (更好的是使用 data.table,但我发现使用 .SD 仍然很困难]

我更喜欢在 dplyr 中尽可能简单地完成这些事情。
另一种不扩散的方法如下,它产生相同的结果。 [没有data.table其中.SD对我来说仍然很难掌握!
所以在 3 行中:

testdf%>%
  group_by(name,rep)%>%    
  summarise(about = paste("The pet is ",value[field=='pet']," and it is ",value[field=='age']))

产量:

      name    rep about                             
  <chr> <dbl> <chr>                             
1 joe       1 The pet is  fish  and it is  young
2 john      1 The pet is  dog  and it is  young 
3 john      2 The pet is  cat  and it is  old 

Your data is long format and not tidy, with multiple fields in one. So spread it or pivot wider is what answered by langtang. (better is with data.table but I find it difficult still to use .SD]

I prefer doing these things as simple as possible in dplyr.
An alternative -without spreading is as follows which yields same results. [Without data.table where .SD is still difficult for me to grasp!
so in 3 lines:

testdf%>%
  group_by(name,rep)%>%    
  summarise(about = paste("The pet is ",value[field=='pet']," and it is ",value[field=='age']))

yields:

      name    rep about                             
  <chr> <dbl> <chr>                             
1 joe       1 The pet is  fish  and it is  young
2 john      1 The pet is  dog  and it is  young 
3 john      2 The pet is  cat  and it is  old 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文