按R中的每两行计算每两行的值的比例

发布于 2025-02-13 00:12:05 字数 1851 浏览 2 评论 0原文

我有这个数据集,



df <- tibble(id, event, duration)

我需要使用后续“表面”计算表面的持续时间比例,然后将结果插入新列中。所有这些都被“ ID”隔开。

比例= Surface/dive+Surface

#Output dataframe

# A tibble: 8 x 4
  id    event   duration proportion    
1 A     surface       56 x         
2 A     surface       96 x         
3 A     surface       14 x         
4 A     surface       77 x         
5 B     surface       28 x         
6 B     surface       63 x         
7 B     surface       47 x         
8 B     surface       90 x   

############################################################

编辑:

在我的原始数据中,我有一些“潜水”,没有“表面”,而创建的代码是错误的。

Error in `dplyr::mutate()`:
! Problem while computing `proportion = DurationMin[What ==
  "Surface"]/sum(DurationMin)`.
✖ `proportion` must be size 2 or 1, not 0.
ℹ The error occurred in group 2803: ptt = "2017111870", grp = 1015.

在“ ID”内部会有奇数的行,其中“潜水”事件将在其顺序中没有“表面”。因此,我需要每次遇到未配对的事件时,都会忽略或插入NA。有可能吗?

请按照此数据框架示例:


id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")

event <- c("dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive")

duration <- c(55, 56, 40, 96, 58, 14, 43, 77, 19, 28, 34, 63, 29, 47, 61)

df <- tibble(id, event, duration)

> df
   id   event duration
1   A    dive       55
2   A surface       56
3   A    dive       40
4   A surface       96
5   A    dive       58
6   A surface       14
7   A    dive       43
8   A surface       77
9   B    dive       19
10  B surface       28
11  B    dive       34
12  B surface       63
13  B    dive       29
14  B surface       47
15  B    dive       61
16  B    dive       45
17  B surface       30
> 

I have this dataset



df <- tibble(id, event, duration)

I need that the each "dive" row the duration proportion of surface be calculated using the subsequent "surface", and insert the result into a new column. All this separated by "id".

proportion = surface/dive+surface

#Output dataframe

# A tibble: 8 x 4
  id    event   duration proportion    
1 A     surface       56 x         
2 A     surface       96 x         
3 A     surface       14 x         
4 A     surface       77 x         
5 B     surface       28 x         
6 B     surface       63 x         
7 B     surface       47 x         
8 B     surface       90 x   

############################################################

Edit:

In my original data, i have some "dive" without "surface" and this code created is with error.

Error in `dplyr::mutate()`:
! Problem while computing `proportion = DurationMin[What ==
  "Surface"]/sum(DurationMin)`.
✖ `proportion` must be size 2 or 1, not 0.
ℹ The error occurred in group 2803: ptt = "2017111870", grp = 1015.

Inside an 'id' there will be an odd number of rows, where a "dive" event will not have a "surface" in its sequence. So I need that every time an unpaired event is encountered, it is either ignored or an NA is inserted. It's possible?

Follow this dataframe example:


id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")

event <- c("dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive")

duration <- c(55, 56, 40, 96, 58, 14, 43, 77, 19, 28, 34, 63, 29, 47, 61)

df <- tibble(id, event, duration)

> df
   id   event duration
1   A    dive       55
2   A surface       56
3   A    dive       40
4   A surface       96
5   A    dive       58
6   A surface       14
7   A    dive       43
8   A surface       77
9   B    dive       19
10  B surface       28
11  B    dive       34
12  B surface       63
13  B    dive       29
14  B surface       47
15  B    dive       61
16  B    dive       45
17  B surface       30
> 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寂寞清仓 2025-02-20 00:12:05

我们可以使用gl每2行创建分组索引,然后通过将事件值为“ surface”的“持续时间”来创建列“比例”(event =='Surface ')带有sum'duration' -

library(dplyr)
df %>%
   group_by(id) %>%
   group_by(grp = as.integer(gl(n(), 2, n())), .add = TRUE) %>% 
   mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>%
   ungroup %>%
   select(-grp)

输出

# A tibble: 16 × 4
   id    event   duration proportion
   <chr> <chr>      <dbl>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61      0.596
16 B     surface       90      0.596

输出新数据集的

df %>% 
  group_by(id) %>% 
  group_by(grp = cumsum(event == 'dive'), .add = TRUE) %>% 
  mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>% 
  ungroup %>%
  select(-grp)

,我们可以使用-output

# A tibble: 17 × 4
   id    event   duration proportion
   <chr> <chr>      <int>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61     NA    
16 B     dive          45      0.4  
17 B     surface       30      0.4  

数据

df <- structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B", "B"), event = c("dive", 
"surface", "dive", "surface", "dive", "surface", "dive", "surface", 
"dive", "surface", "dive", "surface", "dive", "surface", "dive", 
"dive", "surface"), duration = c(55L, 56L, 40L, 96L, 58L, 14L, 
43L, 77L, 19L, 28L, 34L, 63L, 29L, 47L, 61L, 45L, 30L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17"))

We can use gl to create the grouping index every 2 rows, and then create the column 'proportion' by dividing the 'duration' where event value is 'surface' (event == 'surface') with the sum of 'duration'

library(dplyr)
df %>%
   group_by(id) %>%
   group_by(grp = as.integer(gl(n(), 2, n())), .add = TRUE) %>% 
   mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>%
   ungroup %>%
   select(-grp)

-output

# A tibble: 16 × 4
   id    event   duration proportion
   <chr> <chr>      <dbl>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61      0.596
16 B     surface       90      0.596

For the new dataset, we may use

df %>% 
  group_by(id) %>% 
  group_by(grp = cumsum(event == 'dive'), .add = TRUE) %>% 
  mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>% 
  ungroup %>%
  select(-grp)

-output

# A tibble: 17 × 4
   id    event   duration proportion
   <chr> <chr>      <int>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61     NA    
16 B     dive          45      0.4  
17 B     surface       30      0.4  

data

df <- structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B", "B"), event = c("dive", 
"surface", "dive", "surface", "dive", "surface", "dive", "surface", 
"dive", "surface", "dive", "surface", "dive", "surface", "dive", 
"dive", "surface"), duration = c(55L, 56L, 40L, 96L, 58L, 14L, 
43L, 77L, 19L, 28L, 34L, 63L, 29L, 47L, 61L, 45L, 30L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17"))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文