计算R中亚组的总数变化

发布于 2025-01-22 07:31:36 字数 3301 浏览 0 评论 0原文

我有以下数据框架:

# A tibble: 8 x 5
   Year Group Unit  Profit Sales
  <dbl> <chr> <chr>  <dbl> <dbl>
1  2021 One   A         20    70
2  2021 One   B         10    40
3  2021 One   C          5    25
4  2021 Two   D         15    50
5  2022 One   A        NaN    50
6  2022 One   B        NaN    55
7  2022 One   E        NaN   150
8  2022 Two   D        NaN    60

我想看到每组销售的增长。问题是组没有相同的单元。因此,为了估算增长 /下降,我想仅计算仍然存在的单位的增长。

因此,所需的结果就是这样:

# A tibble: 8 x 8
   Year Group Unit  Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
  <dbl> <chr> <chr>  <dbl> <dbl>                            <dbl>                  <dbl>            <dbl>
1  2021 One   A         20    70                              110                    NaN          NaN    
2  2021 One   B         10    40                              110                    NaN          NaN    
3  2021 One   C          5    25                              110                    NaN          NaN    
4  2021 Two   D         15    50                               50                    NaN          NaN    
5  2022 One   A        NaN    50                              NaN                    105            0.955
6  2022 One   B        NaN    55                              NaN                    105            0.955
7  2022 One   E        NaN   150                              NaN                    105            0.955
8  2022 Two   D        NaN    60                              NaN                     60            1.2  

为了实现这一目标,我完成了下面的代码。这有效,但这是很多代码。所以我的问题是,有没有更聪明 /更优雅的方法来做到这一点?

test2 <- test %>%
    group_by(
        Unit
    ) %>%
    mutate(
        Years_present = list(min(Year):max(Year))
    ) %>%
    ungroup() %>%
    mutate(
        Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
        Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
        Present_next_year = ifelse(
            Present_next_year == TRUE,
            Sales,
            0
        ),
        Present_last_year = ifelse(
            Present_last_year == TRUE,
            Sales,
            0
        )
    ) %>%
    group_by(
        Group, Year
    ) %>%
    mutate(
        Sales_group_comparison_next_year = sum(Present_next_year),
        Sales_group_comparison_last_year = sum(Present_last_year),
    ) %>%
    ungroup() %>%
    group_by(Unit) %>%
    mutate(
        Sales_group_comparison_lagged =
        dplyr::lag(
            x = Sales_group_comparison_next_year,
            n = 1,
            default = NaN,
            order_by = Year
        ),
    ) %>%
    ungroup() %>%
    rowwise() %>%
    mutate(
        Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
    ) %>%
    ungroup() %>%
    group_by(Group, Year) %>%
    mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
                                      NaN,
                                      max(Group_growth_rate, na.rm = TRUE))
           ) %>%
    ungroup()

谢谢!

I have the following dataframe:

# A tibble: 8 x 5
   Year Group Unit  Profit Sales
  <dbl> <chr> <chr>  <dbl> <dbl>
1  2021 One   A         20    70
2  2021 One   B         10    40
3  2021 One   C          5    25
4  2021 Two   D         15    50
5  2022 One   A        NaN    50
6  2022 One   B        NaN    55
7  2022 One   E        NaN   150
8  2022 Two   D        NaN    60

And I want to see the growth in sales per group. The problem is that the groups do not have the same units. So, in order to get an estimate of the growth / decline I want to calculate the growth on only the units that are still present.

So the desired result is something like this:

# A tibble: 8 x 8
   Year Group Unit  Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
  <dbl> <chr> <chr>  <dbl> <dbl>                            <dbl>                  <dbl>            <dbl>
1  2021 One   A         20    70                              110                    NaN          NaN    
2  2021 One   B         10    40                              110                    NaN          NaN    
3  2021 One   C          5    25                              110                    NaN          NaN    
4  2021 Two   D         15    50                               50                    NaN          NaN    
5  2022 One   A        NaN    50                              NaN                    105            0.955
6  2022 One   B        NaN    55                              NaN                    105            0.955
7  2022 One   E        NaN   150                              NaN                    105            0.955
8  2022 Two   D        NaN    60                              NaN                     60            1.2  

in order to achieve this I did the code below. This works, but it is a lot of code. So my question is, are there smarter / more elegant ways to do this?

test2 <- test %>%
    group_by(
        Unit
    ) %>%
    mutate(
        Years_present = list(min(Year):max(Year))
    ) %>%
    ungroup() %>%
    mutate(
        Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
        Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
        Present_next_year = ifelse(
            Present_next_year == TRUE,
            Sales,
            0
        ),
        Present_last_year = ifelse(
            Present_last_year == TRUE,
            Sales,
            0
        )
    ) %>%
    group_by(
        Group, Year
    ) %>%
    mutate(
        Sales_group_comparison_next_year = sum(Present_next_year),
        Sales_group_comparison_last_year = sum(Present_last_year),
    ) %>%
    ungroup() %>%
    group_by(Unit) %>%
    mutate(
        Sales_group_comparison_lagged =
        dplyr::lag(
            x = Sales_group_comparison_next_year,
            n = 1,
            default = NaN,
            order_by = Year
        ),
    ) %>%
    ungroup() %>%
    rowwise() %>%
    mutate(
        Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
    ) %>%
    ungroup() %>%
    group_by(Group, Year) %>%
    mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
                                      NaN,
                                      max(Group_growth_rate, na.rm = TRUE))
           ) %>%
    ungroup()

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

咿呀咿呀哟 2025-01-29 07:31:36

这是一种使用一些简单的dplyr数学和一个自我加入的方法,可以为每组的总数吸引去年的总数。

library(dplyr)
data_with_2023 %>%
  group_by(Group, Unit) %>%
  mutate(in_next_yr = lead(Year, default = 0) == Year + 1,
         in_last_yr = lag(Year, default = 0) == Year - 1) %>%
  group_by(Year, Group) %>%
  mutate(total_in_next_yr = sum(Sales*in_next_yr),
         total_in_prior_yr = sum(Sales*in_last_yr)) %>%
  ungroup() -> data_totals

left_join(data_totals,
          data_totals %>% 
            distinct(Year = Year +1, 
                     Group, 
                     last_yr_total = total_in_next_yr)) %>%
  mutate(growth = total_in_prior_yr / last_yr_total)

结果

Joining, by = c("Year", "Group")
# A tibble: 10 × 11
    Year Group Unit  Profit Sales in_next_yr in_last_yr total_in_next_yr total_in_prior_yr last_yr_total growth
   <dbl> <chr> <chr> <chr>  <int> <lgl>      <lgl>                 <int>             <int>         <int>  <dbl>
 1  2021 One   A     20        70 TRUE       FALSE                   110                 0            NA NA    
 2  2021 One   B     10        40 TRUE       FALSE                   110                 0            NA NA    
 3  2021 One   C     5         25 FALSE      FALSE                   110                 0            NA NA    
 4  2021 Two   D     15        50 TRUE       FALSE                    50                 0            NA NA    
 5  2022 One   A     NaN       50 FALSE      TRUE                    205               105           110  0.955
 6  2022 One   B     NaN       55 TRUE       TRUE                    205               105           110  0.955
 7  2022 One   E     NaN      150 TRUE       FALSE                   205               105           110  0.955
 8  2022 Two   D     NaN       60 FALSE      TRUE                      0                60            50  1.2  
 9  2023 One   B     NaN       70 FALSE      TRUE                      0               150           205  0.732
10  2023 One   E     NaN       80 FALSE      TRUE                      0               150           205  0.732

源数据

data_with_2023 <- data.frame(
  stringsAsFactors = FALSE,
              Year = c(2021L, 2021L, 2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2023L),
             Group = c("One", "One", "One", "Two", "One", "One", "One", "Two", "One", "One"),
              Unit = c("A", "B", "C", "D", "A", "B", "E", "D", "B", "E"),
            Profit = c("20", "10", "5", "15", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"),
             Sales = c(70L, 40L, 25L, 50L, 50L, 55L, 150L, 60L, 70L, 80L)
)

Here's an approach using some simpler dplyr math and a self-join to pull in last year's totals for each Group.

library(dplyr)
data_with_2023 %>%
  group_by(Group, Unit) %>%
  mutate(in_next_yr = lead(Year, default = 0) == Year + 1,
         in_last_yr = lag(Year, default = 0) == Year - 1) %>%
  group_by(Year, Group) %>%
  mutate(total_in_next_yr = sum(Sales*in_next_yr),
         total_in_prior_yr = sum(Sales*in_last_yr)) %>%
  ungroup() -> data_totals

left_join(data_totals,
          data_totals %>% 
            distinct(Year = Year +1, 
                     Group, 
                     last_yr_total = total_in_next_yr)) %>%
  mutate(growth = total_in_prior_yr / last_yr_total)

Result

Joining, by = c("Year", "Group")
# A tibble: 10 × 11
    Year Group Unit  Profit Sales in_next_yr in_last_yr total_in_next_yr total_in_prior_yr last_yr_total growth
   <dbl> <chr> <chr> <chr>  <int> <lgl>      <lgl>                 <int>             <int>         <int>  <dbl>
 1  2021 One   A     20        70 TRUE       FALSE                   110                 0            NA NA    
 2  2021 One   B     10        40 TRUE       FALSE                   110                 0            NA NA    
 3  2021 One   C     5         25 FALSE      FALSE                   110                 0            NA NA    
 4  2021 Two   D     15        50 TRUE       FALSE                    50                 0            NA NA    
 5  2022 One   A     NaN       50 FALSE      TRUE                    205               105           110  0.955
 6  2022 One   B     NaN       55 TRUE       TRUE                    205               105           110  0.955
 7  2022 One   E     NaN      150 TRUE       FALSE                   205               105           110  0.955
 8  2022 Two   D     NaN       60 FALSE      TRUE                      0                60            50  1.2  
 9  2023 One   B     NaN       70 FALSE      TRUE                      0               150           205  0.732
10  2023 One   E     NaN       80 FALSE      TRUE                      0               150           205  0.732

Source data

data_with_2023 <- data.frame(
  stringsAsFactors = FALSE,
              Year = c(2021L, 2021L, 2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2023L),
             Group = c("One", "One", "One", "Two", "One", "One", "One", "Two", "One", "One"),
              Unit = c("A", "B", "C", "D", "A", "B", "E", "D", "B", "E"),
            Profit = c("20", "10", "5", "15", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"),
             Sales = c(70L, 40L, 25L, 50L, 50L, 55L, 150L, 60L, 70L, 80L)
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文