如何将多行中的值汇总到R中的新列?

发布于 2025-01-29 05:53:25 字数 2604 浏览 1 评论 0原文

我的数据帧:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

  Observation Topic Gamma
1       Apple     1   0.1
2   Blueberry     2   0.1
3      Cirtus     3   0.2
4       Dates     4   0.2
5    Eggplant     5   0.1

如何告诉R添加1、3和5和2和2和4的值,然后将其保存在新列中?例如:

观察主题gamma新变量
苹果1.10.40
蓝莓2.10 .10.30
Cirtus3.20.40
日期4.20 .20.30
茄子5.10 .10.40

本质上,我希望每个观察都有一个新的观察总结主题1、3和5的伽马评分以及主题2和4的价值。

更新:澄清: 我并不是要添加主题号码或奇数主题号码。有时,这两者都是混合的。请参阅此新表作为一个示例:

观察主题gamma新变量
苹果1.10.10
蓝莓2.10.70
cirtus3.20 .20.40
日期为4.20 .20.40
茄子5.10 .10.70
6.50 .50.70

in这个示例,我独自离开主题1,添加了主题2、5和6,并添加了主题3和4。

更新:澄清:

观察主题gamma gamma gamma新变量
Apple1.10.10
Apple2。 10.70
Apple3.20.40
Apple4.20.40
Apple5.10.70
Apple6.50.70
蓝莓1.20 .20.20
蓝莓2.10.60
蓝莓3.30.80
蓝莓4.50。 80
蓝莓5.40.60
蓝莓6.10.60

在此示例中,每个果实(观察)都有自己的每个主题值集,我总结了与上面列出的相同的主题(2、5和6、3和3和4)每种水果。

My dataframe:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

  Observation Topic Gamma
1       Apple     1   0.1
2   Blueberry     2   0.1
3      Cirtus     3   0.2
4       Dates     4   0.2
5    Eggplant     5   0.1

How can I tell R to add the values of 1, 3, and 5, and 2 and 4, and then save it in a new column? For example:

ObservationTopicGammanew variable
Apple1.10.40
Blueberry2.10.30
Cirtus3.20.40
Dates4.20.30
Eggplant5.10.40

Essentially, I'd like each observation to have a new value that sums up the gamma scores of topics 1, 3, and 5, as well as topics 2 and 4.

Update: Clarification:
I am not trying to add even topic numbers or odd topic numbers. Sometimes it will be a mixture of both. See this new table as an example:

ObservationTopicGammanew variable
Apple1.10.10
Blueberry2.10.70
Cirtus3.20.40
Dates4.20.40
Eggplant5.10.70
Fruits6.50.70

In this example, I left topic 1 alone, added topics 2, 5, and 6, and added topics 3 and 4.

Update: Clarification:

ObservationTopicGammanew variable
Apple1.10.10
Apple2.10.70
Apple3.20.40
Apple4.20.40
Apple5.10.70
Apple6.50.70
Blueberry1.20.20
Blueberry2.10.60
Blueberry3.30.80
Blueberry4.50.80
Blueberry5.40.60
Blueberry6.10.60

In this example, Each fruit (observation) has their own set of values for each topic and I summed the same topics as listed above (2, 5, and 6, 3 and 4) per fruit.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

淡看悲欢离合 2025-02-05 05:53:25

更新II 在新请求中:

library(dplyr)

df %>% 
  group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
   <chr>       <int> <dbl>        <dbl>
 1 Apple           1   0.1          0.1
 2 Apple           2   0.1          0.7
 3 Apple           3   0.2          0.4
 4 Apple           4   0.2          0.4
 5 Apple           5   0.1          0.7
 6 Apple           6   0.5          0.7
 7 Blueberry       1   0.2          0.2
 8 Blueberry       2   0.1          0.6
 9 Blueberry       3   0.3          0.8
10 Blueberry       4   0.5          0.8
11 Blueberry       5   0.4          0.6
12 Blueberry       6   0.1          0.6

更新: op的新请求。该解决方案的灵感来自Pauls解决方案(信用他):

library(dplyr)

df %>% 
  group_by(grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
  <chr>       <int> <dbl>        <dbl>
1 Apple           1   0.1          0.1
2 Blueberry       2   0.1          0.7
3 Cirtus          3   0.2          0.4
4 Dates           4   0.2          0.4
5 Eggplant        5   0.1          0.7
6 Fruits          6   0.5          0.7

第一个答案:
我们可以在IFELSE语句中识别奇数甚至行之后总和gamma
这种

library(dplyr)

df %>% 
  mutate(new_variable = ifelse(row_number() %% 2 == 1, 
                               sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
                               sum(Gamma[row_number() %% 2 == 0])) # even 2,4
         )
  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

情况

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

在 “ rel =“ nofollow noreferrer”> “

Update II on new request:

library(dplyr)

df %>% 
  group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
   <chr>       <int> <dbl>        <dbl>
 1 Apple           1   0.1          0.1
 2 Apple           2   0.1          0.7
 3 Apple           3   0.2          0.4
 4 Apple           4   0.2          0.4
 5 Apple           5   0.1          0.7
 6 Apple           6   0.5          0.7
 7 Blueberry       1   0.2          0.2
 8 Blueberry       2   0.1          0.6
 9 Blueberry       3   0.3          0.8
10 Blueberry       4   0.5          0.8
11 Blueberry       5   0.4          0.6
12 Blueberry       6   0.1          0.6

Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):

library(dplyr)

df %>% 
  group_by(grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
  <chr>       <int> <dbl>        <dbl>
1 Apple           1   0.1          0.1
2 Blueberry       2   0.1          0.7
3 Cirtus          3   0.2          0.4
4 Dates           4   0.2          0.4
5 Eggplant        5   0.1          0.7
6 Fruits          6   0.5          0.7

First answer:
We could sum Gamma after identifying odd and even rows in an ifelse statement:
In this case row_number could be replaced by Topic

library(dplyr)

df %>% 
  mutate(new_variable = ifelse(row_number() %% 2 == 1, 
                               sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
                               sum(Gamma[row_number() %% 2 == 0])) # even 2,4
         )
  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

data:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark: AndrewGB's base R is fastest

enter image description here

執念 2025-02-05 05:53:25

这应该做到。

dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
                                 "Dates", "Eggplant"), 
                 Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)), 
            row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>% 
  mutate(even = as.numeric(Topic %% 2 == 0)) %>% 
  group_by(even) %>% 
  mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups:   even [2]
#>   Observation Topic Gamma  even new_variable
#>   <chr>       <int> <dbl> <dbl>        <dbl>
#> 1 Apple           1   0.1     0          0.4
#> 2 Blueberry       2   0.1     1          0.3
#> 3 Cirtus          3   0.2     0          0.4
#> 4 Dates           4   0.2     1          0.3
#> 5 Eggplant        5   0.1     0          0.4

This should do it.

dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
                                 "Dates", "Eggplant"), 
                 Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)), 
            row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>% 
  mutate(even = as.numeric(Topic %% 2 == 0)) %>% 
  group_by(even) %>% 
  mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups:   even [2]
#>   Observation Topic Gamma  even new_variable
#>   <chr>       <int> <dbl> <dbl>        <dbl>
#> 1 Apple           1   0.1     0          0.4
#> 2 Blueberry       2   0.1     1          0.3
#> 3 Cirtus          3   0.2     0          0.4
#> 4 Dates           4   0.2     1          0.3
#> 5 Eggplant        5   0.1     0          0.4

Created on 2022-05-13 by the reprex package (v2.0.1)

想念有你 2025-02-05 05:53:25

另一个可能的解决方案:

library(dplyr)

df %>% 
  group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

#> # A tibble: 5 × 4
#>   Observation Topic Gamma new_variable
#>   <chr>       <int> <dbl>        <dbl>
#> 1 Apple           1   0.1          0.4
#> 2 Blueberry       2   0.1          0.3
#> 3 Cirtus          3   0.2          0.4
#> 4 Dates           4   0.2          0.3
#> 5 Eggplant        5   0.1          0.4

Another possible solution:

library(dplyr)

df %>% 
  group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

#> # A tibble: 5 × 4
#>   Observation Topic Gamma new_variable
#>   <chr>       <int> <dbl>        <dbl>
#> 1 Apple           1   0.1          0.4
#> 2 Blueberry       2   0.1          0.3
#> 3 Cirtus          3   0.2          0.4
#> 4 Dates           4   0.2          0.3
#> 5 Eggplant        5   0.1          0.4
一桥轻雨一伞开 2025-02-05 05:53:25

更新II (但也将与第一个更新一起使用)

,我们可以首先创建一个新的分组列,在其中我们将topic> topic列复制为因素我们可以根据要组合在一起的行来更改级别。然后,我们可以通过topip>主题和行组获得gamma列的总和。然后,删除grp列。

df$grp <- factor(df$Topic)

levels(df$grp) <- list(
  "1" = 1,
  "2" = c(2,5,6),
  "3" = c(3,4)
)

df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)

df <- df[,-4]

output

   Observation Topic Gamma new_variable
1        Apple     1   0.1          0.1
2        Apple     2   0.1          0.7
3        Apple     3   0.2          0.4
4        Apple     4   0.2          0.4
5        Apple     5   0.1          0.7
6        Apple     6   0.5          0.7
7    Blueberry     1   0.2          0.2
8    Blueberry     2   0.1          0.6
9    Blueberry     3   0.3          0.8
10   Blueberry     4   0.5          0.8
11   Blueberry     5   0.4          0.6
12   Blueberry     6   0.1          0.6

data

df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple", 
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry", 
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5, 
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
-12L))

第一个答案

使用base r,我们可以使用ave获取每个组的总和。在这里,我使用逻辑创建组,因为我们只有2个组。

df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)

输出

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

,否则我们可以获得每个行分组的总和,并通过索引分配给新列。

df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)

Update II (but will work with the first update as well)

With base R, we can first create a new grouping column, where we copy the Topic column as factor, then we can change the levels according to what rows you want to group together to sum. Then, we can get the sum of the Gamma column by the Topic and row groups. Then, remove the grp column.

df$grp <- factor(df$Topic)

levels(df$grp) <- list(
  "1" = 1,
  "2" = c(2,5,6),
  "3" = c(3,4)
)

df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)

df <- df[,-4]

Output

   Observation Topic Gamma new_variable
1        Apple     1   0.1          0.1
2        Apple     2   0.1          0.7
3        Apple     3   0.2          0.4
4        Apple     4   0.2          0.4
5        Apple     5   0.1          0.7
6        Apple     6   0.5          0.7
7    Blueberry     1   0.2          0.2
8    Blueberry     2   0.1          0.6
9    Blueberry     3   0.3          0.8
10   Blueberry     4   0.5          0.8
11   Blueberry     5   0.4          0.6
12   Blueberry     6   0.1          0.6

Data

df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple", 
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry", 
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5, 
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
-12L))

First Answer

With base R, we can use ave to get the sum for each group. Here, I create the group using a logical since we only have 2 groups.

df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)

Output

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

Or we could get the sum for each grouping of rows and assign to a new column by index.

df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文