删除R中大型数据帧的列中的字符/单词

发布于 2025-01-20 23:39:51 字数 1201 浏览 1 评论 0原文

我目前正在努力从R中的大型数据框中删除单词。这是df：

第一列（geneid）包含一个所谓的“ emembl gene id”。第一个IE ENSG00000223972.5，其次是“ |”。之后，列出了真实的基因名称。因此，我现在想删除包括“ |”的“ Ensembl Gene ID”仅保留本列中的真实基因名称。有什么明智的方法吗？例如使用StringR软件包？

干杯!

编辑：

 > dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P", 
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG", 
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"), 
    `DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0, 
    41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0), 
    `DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36, 
    0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0, 
    49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2, 
    4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2, 
    1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5, 
    9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

原文

I am currently strugling to remove words from a large dataframe in R.
This is the df:

Dataframe

The first column (GeneID) contains a so called "ensembl gene ID". First one i.e. ENSG00000223972.5 followed by a "|". Afterwards, the real Gene name is listed. So i now want to remove the "ensembl gene ID" including the "|" to keep only the real gene name in this column. Is there a smart way to do this ? For example with the stringR package?

Cheers!

Edit:

 > dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P", 
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG", 
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"), 
    `DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0, 
    41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0), 
    `DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36, 
    0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0, 
    49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2, 
    4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2, 
    1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5, 
    9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

分享到QQ

分享到微博