用R汇总和总结数据集信息

发布于 01-18 15:47 字数 660 浏览 4 评论 0 原文

我有一个育种生产力数据集:

df1
    # Nest.box Obs.type individual.number Clutch
    # 1 Nest1 Egg 1 First
    # 2 Nest1 Egg 2 First
    # 3 Nest1 Egg 3 First
    # 4 Nest2 Egg 1 First
    # 5 Nest2 Egg 2 First
    # 6 Nest2 Egg 1 First
    # 7 Nest1 Chick 1 First
    # 8 Nest1 Chick 2 First
    # 9 Nest2 Chick 1 First
    # 10 Nest2 Chick 2 First
    # 11 Nest2 Chick 1 Second
    # 12 Nest2 Chick 2 Second

我想通过 nest.box 离合器来汇总这些数据。

output
        # Nest.box Clutch Eggs Chicks
        # 1 Nest1 First 3 2
        # 2 Nest2 First 2 2
        # 3 Nest2 Second NA 2

I have a breeding productivity dataset:

df1
    # Nest.box Obs.type individual.number Clutch
    # 1 Nest1 Egg 1 First
    # 2 Nest1 Egg 2 First
    # 3 Nest1 Egg 3 First
    # 4 Nest2 Egg 1 First
    # 5 Nest2 Egg 2 First
    # 6 Nest2 Egg 1 First
    # 7 Nest1 Chick 1 First
    # 8 Nest1 Chick 2 First
    # 9 Nest2 Chick 1 First
    # 10 Nest2 Chick 2 First
    # 11 Nest2 Chick 1 Second
    # 12 Nest2 Chick 2 Second

I want to summarise these data by aggregating by Nest.box and Clutch (shows the max number of eggs, the max number of chicks by nest.box, by clutch)

The wanted output would be something like this:

output
        # Nest.box Clutch Eggs Chicks
        # 1 Nest1 First 3 2
        # 2 Nest2 First 2 2
        # 3 Nest2 Second NA 2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

迷乱花海 2025-01-25 15:47:43

Tidyverse 方法是

df %>% group_by(Nest.box, Clutch, Obs.type) %>% 
  summarise(max = max(individual.number)) %>% 
  pivot_wider(id_cols = 1:2, names_from = Obs.type, values_from = max)

# A tibble: 3 x 4
# Groups:   Nest.box, Clutch [3]
  Nest.box Clutch Chick   Egg
  <chr>    <chr>  <dbl> <dbl>
1 Nest1    First      2     3
2 Nest2    First      2     2
3 Nest2    Second     2    NA

Tidyverse approach would be

df %>% group_by(Nest.box, Clutch, Obs.type) %>% 
  summarise(max = max(individual.number)) %>% 
  pivot_wider(id_cols = 1:2, names_from = Obs.type, values_from = max)

# A tibble: 3 x 4
# Groups:   Nest.box, Clutch [3]
  Nest.box Clutch Chick   Egg
  <chr>    <chr>  <dbl> <dbl>
1 Nest1    First      2     3
2 Nest2    First      2     2
3 Nest2    Second     2    NA
泛泛之交 2025-01-25 15:47:43

这些都可以在 tidyr :: pivot_wider() for data.frames:

library(tidyr)

df1 %>% 
  pivot_wider(
    id_cols = c(Nest.box, Clutch),
    names_from = Obs.type, 
    values_from = individual.number, 
    values_fn = max
  )
#> # A tibble: 3 × 4
#>   Nest.box Clutch   Egg Chick
#>   <chr>    <chr>  <dbl> <dbl>
#> 1 Nest1    First      3     2
#> 2 Nest2    First      2     2
#> 3 Nest2    Second    NA     2

在2022-04-01创建的<="" code=""> “ rel =“ nofollow noreferrer”> reprex软件包(v2.0.1)

数据

df1 <- 
  tibble::tribble(
    ~Nest.box, ~Obs.type, ~individual.number, ~Clutch,
    "Nest1", "Egg", 1, "First",
    "Nest1", "Egg", 2, "First",
    "Nest1", "Egg", 3, "First",
    "Nest2", "Egg", 1, "First",
    "Nest2", "Egg", 2, "First",
    "Nest2", "Egg", 1, "First",
    "Nest1", "Chick", 1, "First",
    "Nest1", "Chick", 2, "First",
    "Nest2", "Chick", 1, "First",
    "Nest2", "Chick", 2, "First",
    "Nest2", "Chick", 1, "Second",
    "Nest2", "Chick", 2, "Second"
  )


This can all be done within tidyr::pivot_wider() for data.frames:

library(tidyr)

df1 %>% 
  pivot_wider(
    id_cols = c(Nest.box, Clutch),
    names_from = Obs.type, 
    values_from = individual.number, 
    values_fn = max
  )
#> # A tibble: 3 × 4
#>   Nest.box Clutch   Egg Chick
#>   <chr>    <chr>  <dbl> <dbl>
#> 1 Nest1    First      3     2
#> 2 Nest2    First      2     2
#> 3 Nest2    Second    NA     2

Created on 2022-04-01 by the reprex package (v2.0.1)

Data

df1 <- 
  tibble::tribble(
    ~Nest.box, ~Obs.type, ~individual.number, ~Clutch,
    "Nest1", "Egg", 1, "First",
    "Nest1", "Egg", 2, "First",
    "Nest1", "Egg", 3, "First",
    "Nest2", "Egg", 1, "First",
    "Nest2", "Egg", 2, "First",
    "Nest2", "Egg", 1, "First",
    "Nest1", "Chick", 1, "First",
    "Nest1", "Chick", 2, "First",
    "Nest2", "Chick", 1, "First",
    "Nest2", "Chick", 2, "First",
    "Nest2", "Chick", 1, "Second",
    "Nest2", "Chick", 2, "Second"
  )


久光 2025-01-25 15:47:43

data.table 解决方案

library(data.table)
setDT(df)[, .(Eggs = uniqueN(individual.number[Obs.type == "Egg"]),
              Chicks = uniqueN(individual.number[Obs.type == "Chick"])), 
          by = .(Nest.box, Clutch)]

#    Nest.box Clutch Eggs Chicks
# 1:    Nest1  First    3      2
# 2:    Nest2  First    2      2
# 3:    Nest2 Second    0      2

编辑

也有可能:转换为宽格式

dcast(setDT(df), Nest.box + Clutch ~ Obs.type, value.var = "individual.number", fun.aggregate = uniqueN)
#    Nest.box Clutch Chick Egg
# 1:    Nest1  First     2   3
# 2:    Nest2  First     2   2
# 3:    Nest2 Second     2   0

data.table solution

library(data.table)
setDT(df)[, .(Eggs = uniqueN(individual.number[Obs.type == "Egg"]),
              Chicks = uniqueN(individual.number[Obs.type == "Chick"])), 
          by = .(Nest.box, Clutch)]

#    Nest.box Clutch Eggs Chicks
# 1:    Nest1  First    3      2
# 2:    Nest2  First    2      2
# 3:    Nest2 Second    0      2

edit

also a possibility: casting to wide format

dcast(setDT(df), Nest.box + Clutch ~ Obs.type, value.var = "individual.number", fun.aggregate = uniqueN)
#    Nest.box Clutch Chick Egg
# 1:    Nest1  First     2   3
# 2:    Nest2  First     2   2
# 3:    Nest2 Second     2   0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文