计算R中亚组的总数变化
我有以下数据框架:
# A tibble: 8 x 5
Year Group Unit Profit Sales
<dbl> <chr> <chr> <dbl> <dbl>
1 2021 One A 20 70
2 2021 One B 10 40
3 2021 One C 5 25
4 2021 Two D 15 50
5 2022 One A NaN 50
6 2022 One B NaN 55
7 2022 One E NaN 150
8 2022 Two D NaN 60
我想看到每组销售的增长。问题是组没有相同的单元。因此,为了估算增长 /下降,我想仅计算仍然存在的单位的增长。
因此,所需的结果就是这样:
# A tibble: 8 x 8
Year Group Unit Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021 One A 20 70 110 NaN NaN
2 2021 One B 10 40 110 NaN NaN
3 2021 One C 5 25 110 NaN NaN
4 2021 Two D 15 50 50 NaN NaN
5 2022 One A NaN 50 NaN 105 0.955
6 2022 One B NaN 55 NaN 105 0.955
7 2022 One E NaN 150 NaN 105 0.955
8 2022 Two D NaN 60 NaN 60 1.2
为了实现这一目标,我完成了下面的代码。这有效,但这是很多代码。所以我的问题是,有没有更聪明 /更优雅的方法来做到这一点?
test2 <- test %>%
group_by(
Unit
) %>%
mutate(
Years_present = list(min(Year):max(Year))
) %>%
ungroup() %>%
mutate(
Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
Present_next_year = ifelse(
Present_next_year == TRUE,
Sales,
0
),
Present_last_year = ifelse(
Present_last_year == TRUE,
Sales,
0
)
) %>%
group_by(
Group, Year
) %>%
mutate(
Sales_group_comparison_next_year = sum(Present_next_year),
Sales_group_comparison_last_year = sum(Present_last_year),
) %>%
ungroup() %>%
group_by(Unit) %>%
mutate(
Sales_group_comparison_lagged =
dplyr::lag(
x = Sales_group_comparison_next_year,
n = 1,
default = NaN,
order_by = Year
),
) %>%
ungroup() %>%
rowwise() %>%
mutate(
Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
) %>%
ungroup() %>%
group_by(Group, Year) %>%
mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
NaN,
max(Group_growth_rate, na.rm = TRUE))
) %>%
ungroup()
谢谢!
I have the following dataframe:
# A tibble: 8 x 5
Year Group Unit Profit Sales
<dbl> <chr> <chr> <dbl> <dbl>
1 2021 One A 20 70
2 2021 One B 10 40
3 2021 One C 5 25
4 2021 Two D 15 50
5 2022 One A NaN 50
6 2022 One B NaN 55
7 2022 One E NaN 150
8 2022 Two D NaN 60
And I want to see the growth in sales per group. The problem is that the groups do not have the same units. So, in order to get an estimate of the growth / decline I want to calculate the growth on only the units that are still present.
So the desired result is something like this:
# A tibble: 8 x 8
Year Group Unit Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021 One A 20 70 110 NaN NaN
2 2021 One B 10 40 110 NaN NaN
3 2021 One C 5 25 110 NaN NaN
4 2021 Two D 15 50 50 NaN NaN
5 2022 One A NaN 50 NaN 105 0.955
6 2022 One B NaN 55 NaN 105 0.955
7 2022 One E NaN 150 NaN 105 0.955
8 2022 Two D NaN 60 NaN 60 1.2
in order to achieve this I did the code below. This works, but it is a lot of code. So my question is, are there smarter / more elegant ways to do this?
test2 <- test %>%
group_by(
Unit
) %>%
mutate(
Years_present = list(min(Year):max(Year))
) %>%
ungroup() %>%
mutate(
Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
Present_next_year = ifelse(
Present_next_year == TRUE,
Sales,
0
),
Present_last_year = ifelse(
Present_last_year == TRUE,
Sales,
0
)
) %>%
group_by(
Group, Year
) %>%
mutate(
Sales_group_comparison_next_year = sum(Present_next_year),
Sales_group_comparison_last_year = sum(Present_last_year),
) %>%
ungroup() %>%
group_by(Unit) %>%
mutate(
Sales_group_comparison_lagged =
dplyr::lag(
x = Sales_group_comparison_next_year,
n = 1,
default = NaN,
order_by = Year
),
) %>%
ungroup() %>%
rowwise() %>%
mutate(
Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
) %>%
ungroup() %>%
group_by(Group, Year) %>%
mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
NaN,
max(Group_growth_rate, na.rm = TRUE))
) %>%
ungroup()
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一种使用一些简单的dplyr数学和一个自我加入的方法,可以为每组的总数吸引去年的总数。
结果
源数据
Here's an approach using some simpler dplyr math and a self-join to pull in last year's totals for each Group.
Result
Source data