是否有根据另一列列下方的strsplit表格行的r函数?
我有一个数据框架(例如如下:
name student_id age gender
Sam 123_abc_ABC 20 F
John 234_bcd_BCD 18 M
Mark 345_cde_CDE 20 M
Ram xyz_111_XYZ 19 M
Hari uvw_444_UVW 23 M
现在,我需要一个新列作为DF中的student_id_by_govt。student_id_by_govt在Student_id中,但对于不同的名称而言是不同的。学生_id的细分市场(即123,234,345),但对于ram& 个段(即111,444)
第二 让我知道如何获取以下输出:
name student_id age gender student_id_by_govt
Sam 123_abc_ABC 20 F 123
John 234_bcd_BCD 18 M 234
Mark 345_cde_CDE 20 M 345
Ram xyz_111_XYZ 19 M 111
Hari uvw_444_UVW 23 M 444
I am having a data frame (for example as below:
name student_id age gender
Sam 123_abc_ABC 20 F
John 234_bcd_BCD 18 M
Mark 345_cde_CDE 20 M
Ram xyz_111_XYZ 19 M
Hari uvw_444_UVW 23 M
Now, I need a new column as student_id_by_govt in the df. The student_id_by_govt is within the student_id but it is different for different names. For Sam, John, Mark the student_id_by_govt would be first segment of student_id (i.e., 123, 234, 345) but for Ram & Hari, the student_id_by_govt is second segment in the student_id (i.e.,111, 444).
I used the strsplit, lapply commands to get the specfic segment from the student_id but I could not able to apply that command specifically for specific rows to get the desired output mentioned above. Please let me know how to get the output as below:
name student_id age gender student_id_by_govt
Sam 123_abc_ABC 20 F 123
John 234_bcd_BCD 18 M 234
Mark 345_cde_CDE 20 M 345
Ram xyz_111_XYZ 19 M 111
Hari uvw_444_UVW 23 M 444
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不确定我知道你想要什么。
它给出输出:
I am not sure I understand what you want.
It gives the output:
既然您在一个后续评论中更改了一个问题,那么“嗨,在那里,是否有一个基于性别列获得students_id_by_govt列的代码?我的意思是,对于所有'M'性别,Student_id_by_govt是第一部分Student_id(即234,345,XYZ,UVW),但对于“ f”性别,我想要第二部分student_id(即ABC)正如Student_id_by_govt吗?
这是一个简单的基础r解决方案,对于段相等的情况
长度和这些段的位置是稳定的 - 如果字符串的长度不同,但具有某些字符,您可以用一些
substr
用某些regex 功能。
since you changed the question in one of your follow up comments quite a bit ie to "Hi there, is there a code for getting student_id_by_govt column based on gender column? I mean for all the 'M' gender, the student_id_by_govt is first segment in the student_id (ie., 234, 345, xyz, uvw) but for 'F' gender I want the second segment of student_id (i.e., abc) as student_id_by_govt? Please note, the segments I want from student_id in my original data is not numeric."
here is a simple base R solution for the case that the segments are of equal
length and the positions of those segments are stable - in case the strings have differing length but have certain characters based on which you could identify the segments you could replace that
substr
with someregex
function.您可以通过函数 str_extract 从库中 stringr :
这是输出:
我更加舒适,是tidyverse软件包。我希望这个解决方案能帮助您
You can use regex via the function str_extract from the library stringr:
Here is the output:
I am more confortable the tidyverse package. I hope this solution will help you
使用
parse_number
从字符串中提取所有数字的另一个选项:在2022-07-01创建的 reprex软件包(v2.0.1)
Another option using
parse_number
to extract all numbers from a string:Created on 2022-07-01 by the reprex package (v2.0.1)
您只需要
str_extract
:edit :
如果
student_id_by_govt
由gens> gens> gender
确定,则如评论中的op注释:对于所有“ m”性别,student_id_by_govt
是Student> Student_id
(即234,345, xyz,uvw),但对于“ f”性别
我想要student> Student_id
(即ABC)的第二部分,然后这可行:在这里,我们基本上依靠负面字符类
[^_]+
,除了下划线或多次以外的任何字符以及正面的lookbehind匹配(? (含义“ 仅在右侧有下划线时匹配”)。
编辑中解决方案的数据:
You only need
str_extract
:EDIT:
If the
student_id_by_govt
is determined bygender
, as OP notes in comment: "for all the 'M' gender, thestudent_id_by_govt
is first segment in thestudent_id
(ie., 234, 345, xyz, uvw) but for 'F'gender
I want the second segment ofstudent_id
(i.e., abc)", then this works:Here, we essentially rely on the negative character class
[^_]+
, which matches any character but the underscore one or more times as well as positive lookbehind(?<=_)
(meaning "match only if there is an underscore to the left") and lookahead(?=_)
(meaning "match only if there is an underscore to the right").Data for solution in EDIT: