在r中使用突变（）和跨（）创建多个新列

发布于 2025-01-20 17:37:33 字数 2352 浏览 0 评论 0原文

我想同时对许多列进行以下计算，而它们则由ID分组：

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

数据集需要资产列。喜欢为每一个创建一个流列。

我知道我可以为每一列这样做这样的事情：

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

但是，正如我想对50列以上的计算进行此计算，我希望有一种更优雅的方法。据我所知，在跨（）函数的dplyr时，这应该是可能的，但是我无法弄清楚如何做到这一点。

我希望新列被命名为Flow.yyyy.mm，这使问题进一步复杂化。我认为实现这一目标的最简单方法可能是在创建列后简单地重命名。

我还考虑过将数据框架从宽格式转换为长格式以执行此计算，但是这对我来说似乎更加复杂。

关于实现预期结果的任何建议吗？

请根据要求找到下面的示例数据：

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

所需的结果是：

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274

原文

I would like to perform the following calculation on many columns at the same time while they are grouped by ID:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.

I know that I could do it like this for every column:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

But as I want to do this calculation for 50+ columns I was hoping there is a more elegant way. To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.

I would like the new columns to be named Flows.YYYY.MM which complicates the issue further. I thought that the easiest way to achieve this might be to simply rename the columns after creating them.

I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.

Any suggestions on achieving the desired outcome?

Please find below the sample data, as requested:

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

The desired outcome is:

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拥抱没勇气 2025-01-27 17:37:33

怎么样：

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

^由

How about this:

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

^{Created on 2022-04-10 by the reprex package (v2.0.1)}

回复收藏 0 原文

~没有更多了~