删除R列中的部分变量名

发布于 2025-01-09 09:46:09 字数 992 浏览 0 评论 0原文

我想清理 R 变量列以仅获取物种名称。我想删除第二个“_”之后的变量名称。

这是我的桌子：

col1	Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20	4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1	5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP	5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M	3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134	5

我想要:

col1	Col2
Pelagodinium_beii	4
Acanthoeca_10tr	5
Rhodosorus_marinus	5
Vannella_sp。	3
Florenciella_parvula	5

我不太习惯 R，也没有找到合适的方法。

原文

I want to clean up an R variable column to get only the species names. I would like to remove the variable names after the 2nd "_".

This is my table :

col1	Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20	4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1	5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP	5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M	3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134	5

I would like to have :

col1	Col2
Pelagodinium_beii	4
Acanthoeca_10tr	5
Rhodosorus_marinus	5
Vannella_sp.	3
Florenciella_parvula	5

I'm not really used to R and I didn't find the right method.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魂牵梦绕锁你心扉 2025-01-16 09:46:09

df$col1 <- sub("^([^_]+_[^_]+)_.*", "\\1", df$col1, perl = TRUE)
df

                  col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

使用df如下：

df <- read.table(
  text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
  header = TRUE
)

df$col1 <- sub("^([^_]+_[^_]+)_.*", "\\1", df$col1, perl = TRUE)
df

                  col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

With df as follows:

df <- read.table(
  text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
  header = TRUE
)

回复收藏 0 原文

-黛色若梦 2025-01-16 09:46:09

带有 strsplit 的选项：

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))


# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

另一种方法是使用 stringr 包中的 word：

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1

An option with strsplit:

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))


# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

Another way would be to use word from stringr package:

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1

回复收藏 0 原文

~没有更多了~

关于作者

烂人

暂无简介

文章

26 人气

关注发私信

卷耳

文章 0 评论 0

关注

佚名

文章 0 评论 0

关注

℉服软

文章 0 评论 0

关注

qq_2gSKZM

文章 0 评论 0

关注

凉宸

文章 0 评论 0

关注

gyhjy

文章 0 评论 0

友情链接

文江博客

删除R列中的部分变量名

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

删除R列中的部分变量名

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。