使用模式重新格式化数据
我有一个包含数字和字母混合的大型数据集。只是一个小例子:
sex <- c("M", "F", "F", "M", "M")
ind <- c("I1", "I2", "I3", "I4", "C")
M1 <- c("ab", "bb", "ac", "ad", "dd")
M2 <- c(12, 22, 23, 24, 25)
M3 <- c("AT", "AG", "AC", "GG", "TC")
M4 <- c(22, 23, 24, 14, 24)
mydf <- data.frame(sex, ind, M1, M2, M3, M4)
mydf
sex ind M1 M2 M3 M4
1 M I1 ab 12 AT 22
2 F I2 bb 22 AG 23
3 F I3 ac 23 AC 24
4 M I4 ad 24 GG 14
5 M C dd 25 TC 24
我想在 M1......Mn 列(文件末尾)中的两个字符之间引入“/”标记,以便生成的数据框看起来像:
sex ind M1 M2 M3 M4
1 M I1 a/b 1/2 A/T 2/2
2 F I2 b/b 2/2 A/G 2/3
3 F I3 a/c 2/3 A/C 2/4
4 M I4 a/d 2/4 G/G 1/4
5 M C d/d 2/5 T/C 2/4
抱歉,我不知道如何继续。 ..感谢您的帮助...
I have a large dataset with mix of number and alphabets. Just an small example:
sex <- c("M", "F", "F", "M", "M")
ind <- c("I1", "I2", "I3", "I4", "C")
M1 <- c("ab", "bb", "ac", "ad", "dd")
M2 <- c(12, 22, 23, 24, 25)
M3 <- c("AT", "AG", "AC", "GG", "TC")
M4 <- c(22, 23, 24, 14, 24)
mydf <- data.frame(sex, ind, M1, M2, M3, M4)
mydf
sex ind M1 M2 M3 M4
1 M I1 ab 12 AT 22
2 F I2 bb 22 AG 23
3 F I3 ac 23 AC 24
4 M I4 ad 24 GG 14
5 M C dd 25 TC 24
I want to introduce a "/" marks between two characters in columns M1......Mn (end of the file) so that the resulting data frame look like:
sex ind M1 M2 M3 M4
1 M I1 a/b 1/2 A/T 2/2
2 F I2 b/b 2/2 A/G 2/3
3 F I3 a/c 2/3 A/C 2/4
4 M I4 a/d 2/4 G/G 1/4
5 M C d/d 2/5 T/C 2/4
Sorry I was clueless how to proceed ...your help appreciated ...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一班轮:
One liner:
R 的所有神秘功能为您提供了以下功能:
splitInsert
使用strsplit
在每个字母处拆分一列,并使用paste
将其重新组合。它被包装在sapply
中以向量化该函数。lapply
在 data.frame 的第 3:6 列上应用splitInsert
,并使用data.frame
将其与两列合并你不想修改的。代码:
结果:
All of the cryptic power of R gives you this:
splitInsert
splits a column at each letter usingstrsplit
and recombines it withpaste
. This is wrapped insapply
to vectorise the function.lapply
to applysplitInsert
over columns 3:6 of your data.frame, anddata.frame
to combine it with the two columns that you don't want modified.splitInsert
is fully general - it will work for text strings of any length, and you can use any new character of choice to recombine the split elements.The code:
The results:
这似乎有效。
输出
This seems to work.
Output