如何从 R 中的特定字符串按组创建序列？

发布于 2025-01-18 04:33:12 字数 1564 浏览 1 评论 0原文

我想在组中创建一个数字序列，但从特定的字符串开始。

在此示例中，如果字符串匹配UNP，则应从下一行开始。

Cola	Colb	Seq
A	HM	0
A	Res	0
A	UNP	0
A	Res	1
A	Res	2
A	HM	3
B	HM	0
B	Res 0 B Res	0
B	UNP	0
B	RES	1
B Res 1 B	UNP	2
C	UNP	0

CUMP 0仅应考虑UNP的第1个实例，而不是每个实例在每个组的UNP上

原文

I would like to create a sequence of numbers within a group but starting from a specific string.

In this example, If the string matches UNP then sequence (seq column) should start from the next row.

ColA	Colb	Seq
A	HM	0
A	RES	0
A	UNP	0
A	RES	1
A	RES	2
A	HM	3
B	HM	0
B	RES	0
B	UNP	0
B	RES	1
B	UNP	2
C	UNP	0

Only 1st instance of UNP should be considered not every instance on UNP for each group

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

养猫人 2025-01-25 04:33:12

您可以首先创建一个指定“ UNP”的第一次出现的列，然后使用cumsum（）和lag（）来计算seq column 。

library(dplyr)

df <- read.table(header = T, text = "
ColA    Colb    Seq
A   HM  0
A   RES 0
A   UNP 0
A   RES 1
A   RES 2
A   HM  3
B   HM  0
B   RES 0
B   UNP 0
B   RES 1
B   UNP 2
C   UNP 0") %>% 
  select(-Seq)

df %>% 
  group_by(ColA, Colb) %>% 
  mutate(seq_count = ifelse(first(Colb) == "UNP" & !duplicated(Colb), 1, 0)) %>% 
  group_by(ColA) %>% 
  mutate(Seq = lag(cumsum(cumsum(seq_count)), default = 0), .keep = "unused")
#> # A tibble: 12 × 3
#> # Groups:   ColA [3]
#>    ColA  Colb    Seq
#>    <chr> <chr> <dbl>
#>  1 A     HM        0
#>  2 A     RES       0
#>  3 A     UNP       0
#>  4 A     RES       1
#>  5 A     RES       2
#>  6 A     HM        3
#>  7 B     HM        0
#>  8 B     RES       0
#>  9 B     UNP       0
#> 10 B     RES       1
#> 11 B     UNP       2
#> 12 C     UNP       0

^由

You can first create a column specifying the first occurrence of "UNP", then use cumsum() and lag() to calculate the Seq column.

library(dplyr)

df <- read.table(header = T, text = "
ColA    Colb    Seq
A   HM  0
A   RES 0
A   UNP 0
A   RES 1
A   RES 2
A   HM  3
B   HM  0
B   RES 0
B   UNP 0
B   RES 1
B   UNP 2
C   UNP 0") %>% 
  select(-Seq)

df %>% 
  group_by(ColA, Colb) %>% 
  mutate(seq_count = ifelse(first(Colb) == "UNP" & !duplicated(Colb), 1, 0)) %>% 
  group_by(ColA) %>% 
  mutate(Seq = lag(cumsum(cumsum(seq_count)), default = 0), .keep = "unused")
#> # A tibble: 12 × 3
#> # Groups:   ColA [3]
#>    ColA  Colb    Seq
#>    <chr> <chr> <dbl>
#>  1 A     HM        0
#>  2 A     RES       0
#>  3 A     UNP       0
#>  4 A     RES       1
#>  5 A     RES       2
#>  6 A     HM        3
#>  7 B     HM        0
#>  8 B     RES       0
#>  9 B     UNP       0
#> 10 B     RES       1
#> 11 B     UNP       2
#> 12 C     UNP       0

^{Created on 2022-03-31 by the reprex package (v2.0.1)}

回复收藏 0 原文

~没有更多了~