是否有根据另一列列下方的strsplit表格行的r函数?

发布于 2025-02-12 06:18:11 字数 619 浏览 2 评论 0原文

我有一个数据框架(例如如下:

name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   M
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M

现在,我需要一个新列作为DF中的student_id_by_govt。student_id_by_govt在Student_id中,但对于不同的名称而言是不同的。学生_id的细分市场(即123,234,345),但对于ram& 个段(即111,444)

第二 让我知道如何获取以下输出:

name  student_id   age  gender student_id_by_govt
Sam   123_abc_ABC  20   F      123
John  234_bcd_BCD  18   M      234
Mark  345_cde_CDE  20   M      345
Ram   xyz_111_XYZ  19   M      111
Hari  uvw_444_UVW  23   M      444

I am having a data frame (for example as below:

name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   M
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M

Now, I need a new column as student_id_by_govt in the df. The student_id_by_govt is within the student_id but it is different for different names. For Sam, John, Mark the student_id_by_govt would be first segment of student_id (i.e., 123, 234, 345) but for Ram & Hari, the student_id_by_govt is second segment in the student_id (i.e.,111, 444).

I used the strsplit, lapply commands to get the specfic segment from the student_id but I could not able to apply that command specifically for specific rows to get the desired output mentioned above. Please let me know how to get the output as below:

name  student_id   age  gender student_id_by_govt
Sam   123_abc_ABC  20   F      123
John  234_bcd_BCD  18   M      234
Mark  345_cde_CDE  20   M      345
Ram   xyz_111_XYZ  19   M      111
Hari  uvw_444_UVW  23   M      444

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

失退 2025-02-19 06:18:17

我不确定我知道你想要什么。

library(dplyr)
library(stringr)
library(purrr)

df <- tibble(Name = c("Sam", "John", "Mark", "Ram", "Hari"), student_id = c("123_abc_ABC", "234_596_BCD", "345_cde_CDE", "xyz_111_XYZ", "uvw_444_UVW"), Gender = c("F", "M", "M", "M", "M")) %>%
      # mutate(student_id_by_gvt = if_else(Gender == "M", str_split(student_id, "_")[[1]][1], str_split(student_id, "_")[[1]][2]))
    mutate(student_id_by_gvt = map2_chr(Gender, student_id, function(x,y){if_else(x == "M", str_split(y, "_")[[1]][1], str_split(y, "_")[[1]][2])}))

它给出输出:

 Name  student_id  Gender student_id_by_gvt
  <chr> <chr>       <chr>  <chr>            
1 Sam   123_abc_ABC F      abc              
2 John  234_596_BCD M      234              
3 Mark  345_cde_CDE M      345              
4 Ram   xyz_111_XYZ M      xyz              
5 Hari  uvw_444_UVW M      uvw              

I am not sure I understand what you want.

library(dplyr)
library(stringr)
library(purrr)

df <- tibble(Name = c("Sam", "John", "Mark", "Ram", "Hari"), student_id = c("123_abc_ABC", "234_596_BCD", "345_cde_CDE", "xyz_111_XYZ", "uvw_444_UVW"), Gender = c("F", "M", "M", "M", "M")) %>%
      # mutate(student_id_by_gvt = if_else(Gender == "M", str_split(student_id, "_")[[1]][1], str_split(student_id, "_")[[1]][2]))
    mutate(student_id_by_gvt = map2_chr(Gender, student_id, function(x,y){if_else(x == "M", str_split(y, "_")[[1]][1], str_split(y, "_")[[1]][2])}))

It gives the output:

 Name  student_id  Gender student_id_by_gvt
  <chr> <chr>       <chr>  <chr>            
1 Sam   123_abc_ABC F      abc              
2 John  234_596_BCD M      234              
3 Mark  345_cde_CDE M      345              
4 Ram   xyz_111_XYZ M      xyz              
5 Hari  uvw_444_UVW M      uvw              
幸福还没到 2025-02-19 06:18:17

既然您在一个后续评论中更改了一个问题,那么“嗨,在那里,是否有一个基于性别列获得students_id_by_govt列的代码?我的意思是,对于所有'M'性别,Student_id_by_govt是第一部分Student_id(即234,345,XYZ,UVW),但对于“ f”性别,我想要第二部分student_id(即ABC)正如Student_id_by_govt吗?
这是一个简单的基础r解决方案,对于段相等的情况
长度和这些段的位置是稳定的 - 如果字符串的长度不同,但具有某些字符,您可以用一些substr用某些regex 功能。

df$student_id_by_govt <- ifelse(df$gender == "M", 
                          substr(df$student_id, 1,3), substr(df$student_id, 5,7))

since you changed the question in one of your follow up comments quite a bit ie to "Hi there, is there a code for getting student_id_by_govt column based on gender column? I mean for all the 'M' gender, the student_id_by_govt is first segment in the student_id (ie., 234, 345, xyz, uvw) but for 'F' gender I want the second segment of student_id (i.e., abc) as student_id_by_govt? Please note, the segments I want from student_id in my original data is not numeric."
here is a simple base R solution for the case that the segments are of equal
length and the positions of those segments are stable - in case the strings have differing length but have certain characters based on which you could identify the segments you could replace that substr with some regex function.

df$student_id_by_govt <- ifelse(df$gender == "M", 
                          substr(df$student_id, 1,3), substr(df$student_id, 5,7))
冷情妓 2025-02-19 06:18:15

您可以通过函数 str_extract 从库中 stringr

library(dplyr)
library(stringr)
library(purrr)

df <- tibble(Name = c("Sam", "John", "Mark", "Ram", "Hari"), student_id = c("123_abc_ABC", "234_bcd_BCD", "345_cde_CDE", "xyz_111_XYZ", "uvw_444_UVW")) %>%
      mutate(student_id_by_gvt = map_chr(student_id, function(x){str_extract(x, "(\\d+)")}))

这是输出:

# A tibble: 5 x 3
  Name  student_id  student_id_by_gvt
  <chr> <chr>       <chr>            
1 Sam   123_abc_ABC 123              
2 John  234_bcd_BCD 234              
3 Mark  345_cde_CDE 345              
4 Ram   xyz_111_XYZ 111              
5 Hari  uvw_444_UVW 444 

我更加舒适,是tidyverse软件包。我希望这个解决方案能帮助您

You can use regex via the function str_extract from the library stringr:

library(dplyr)
library(stringr)
library(purrr)

df <- tibble(Name = c("Sam", "John", "Mark", "Ram", "Hari"), student_id = c("123_abc_ABC", "234_bcd_BCD", "345_cde_CDE", "xyz_111_XYZ", "uvw_444_UVW")) %>%
      mutate(student_id_by_gvt = map_chr(student_id, function(x){str_extract(x, "(\\d+)")}))

Here is the output:

# A tibble: 5 x 3
  Name  student_id  student_id_by_gvt
  <chr> <chr>       <chr>            
1 Sam   123_abc_ABC 123              
2 John  234_bcd_BCD 234              
3 Mark  345_cde_CDE 345              
4 Ram   xyz_111_XYZ 111              
5 Hari  uvw_444_UVW 444 

I am more confortable the tidyverse package. I hope this solution will help you

醉南桥 2025-02-19 06:18:15

使用parse_number从字符串中提取所有数字的另一个选项:

df <- read.table(text="name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   M
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M", header = TRUE)

library(dplyr)
library(purrr)
library(stringr)
df %>%
  mutate(student_id_by_govt = readr::parse_number(as.character(student_id)))
#>   name  student_id age gender student_id_by_govt
#> 1  Sam 123_abc_ABC  20      F                123
#> 2 John 234_bcd_BCD  18      M                234
#> 3 Mark 345_cde_CDE  20      M                345
#> 4  Ram xyz_111_XYZ  19      M                111
#> 5 Hari uvw_444_UVW  23      M                444

在2022-07-01创建的 reprex软件包(v2.0.1)

Another option using parse_number to extract all numbers from a string:

df <- read.table(text="name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   M
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M", header = TRUE)

library(dplyr)
library(purrr)
library(stringr)
df %>%
  mutate(student_id_by_govt = readr::parse_number(as.character(student_id)))
#>   name  student_id age gender student_id_by_govt
#> 1  Sam 123_abc_ABC  20      F                123
#> 2 John 234_bcd_BCD  18      M                234
#> 3 Mark 345_cde_CDE  20      M                345
#> 4  Ram xyz_111_XYZ  19      M                111
#> 5 Hari uvw_444_UVW  23      M                444

Created on 2022-07-01 by the reprex package (v2.0.1)

明月松间行 2025-02-19 06:18:14

您只需要str_extract

library(tidyverse)
df %>%
  mutate(student_id_by_govt = str_extract(student_id, "\\d+"))
# A tibble: 5 × 3
  Name  student_id  student_id_by_govt
  <chr> <chr>       <chr>             
1 Sam   123_abc_ABC 123               
2 John  234_bcd_BCD 234               
3 Mark  345_cde_CDE 345               
4 Ram   xyz_111_XYZ 111               
5 Hari  uvw_444_UVW 444 

edit

如果student_id_by_govtgens> gens> gender确定,则如评论中的op注释:对于所有“ m”性别,student_id_by_govtStudent> Student_id(即234,345, xyz,uvw),但对于“ f” 性别我想要student> Student_id(即ABC)的第二部分,然后这可行:

df %>%
  mutate(student_id_by_govt = ifelse(gender == "M", str_extract(student_id, "^[^_]+"),
                                     str_extract(student_id, "(?<=_)[^_]+(?=_)")))
  name  student_id age gender student_id_by_govt
1  Sam 123_abc_ABC  20      F                abc
2 John 234_bcd_BCD  18      M                234
3 Mark 345_cde_CDE  20      F                cde
4  Ram xyz_111_XYZ  19      M                xyz
5 Hari uvw_444_UVW  23      M                uvw

在这里,我们基本上依靠负面字符类[^_]+,除了下划线或多次以外的任何字符以及正面的lookbehind匹配(? (含义“ 仅在右侧有下划线时匹配”)。

编辑中解决方案的数据:

df <- read.table(text="name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   F
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M", header = TRUE)

You only need str_extract:

library(tidyverse)
df %>%
  mutate(student_id_by_govt = str_extract(student_id, "\\d+"))
# A tibble: 5 × 3
  Name  student_id  student_id_by_govt
  <chr> <chr>       <chr>             
1 Sam   123_abc_ABC 123               
2 John  234_bcd_BCD 234               
3 Mark  345_cde_CDE 345               
4 Ram   xyz_111_XYZ 111               
5 Hari  uvw_444_UVW 444 

EDIT:

If the student_id_by_govt is determined by gender, as OP notes in comment: "for all the 'M' gender, the student_id_by_govt is first segment in the student_id (ie., 234, 345, xyz, uvw) but for 'F' gender I want the second segment of student_id (i.e., abc)", then this works:

df %>%
  mutate(student_id_by_govt = ifelse(gender == "M", str_extract(student_id, "^[^_]+"),
                                     str_extract(student_id, "(?<=_)[^_]+(?=_)")))
  name  student_id age gender student_id_by_govt
1  Sam 123_abc_ABC  20      F                abc
2 John 234_bcd_BCD  18      M                234
3 Mark 345_cde_CDE  20      F                cde
4  Ram xyz_111_XYZ  19      M                xyz
5 Hari uvw_444_UVW  23      M                uvw

Here, we essentially rely on the negative character class [^_]+, which matches any character but the underscore one or more times as well as positive lookbehind (?<=_)(meaning "match only if there is an underscore to the left") and lookahead (?=_)(meaning "match only if there is an underscore to the right").

Data for solution in EDIT:

df <- read.table(text="name  student_id   age  gender
Sam   123_abc_ABC  20   F
John  234_bcd_BCD  18   M
Mark  345_cde_CDE  20   F
Ram   xyz_111_XYZ  19   M
Hari  uvw_444_UVW  23   M", header = TRUE)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文