删除R中大型数据帧的列中的字符/单词
我目前正在努力从R中的大型数据框中删除单词。 这是df:
第一列(geneid)包含一个所谓的“ emembl gene id”。第一个IE ENSG00000223972.5,其次是“ |”。之后,列出了真实的基因名称。因此,我现在想删除包括“ |”的“ Ensembl Gene ID”仅保留本列中的真实基因名称。有什么明智的方法吗?例如使用StringR软件包?
干杯!
编辑:
> dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P",
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG",
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"),
`DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0,
41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0),
`DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36,
0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0,
49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2,
4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2,
1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5,
9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I am currently strugling to remove words from a large dataframe in R.
This is the df:
The first column (GeneID) contains a so called "ensembl gene ID". First one i.e. ENSG00000223972.5 followed by a "|". Afterwards, the real Gene name is listed. So i now want to remove the "ensembl gene ID" including the "|" to keep only the real gene name in this column. Is there a smart way to do this ? For example with the stringR package?
Cheers!
Edit:
> dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P",
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG",
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"),
`DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0,
41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0),
`DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36,
0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0,
49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2,
4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2,
1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5,
9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论