删除R列中的部分变量名

发布于 2025-01-09 09:46:09 字数 992 浏览 0 评论 0原文

我想清理 R 变量列以仅获取物种名称。我想删除第二个“_”之后的变量名称。

这是我的桌子:

col1Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c204
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i15
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP1345

我想要:

col1Col2
Pelagodinium_beii4
Acanthoeca_10tr5
Rhodosorus_marinus5
Vannella_sp。3
Florenciella_parvula5

我不太习惯 R,也没有找到合适的方法。

I want to clean up an R variable column to get only the species names. I would like to remove the variable names after the 2nd "_".

This is my table :

col1Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c204
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i15
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP1345

I would like to have :

col1Col2
Pelagodinium_beii4
Acanthoeca_10tr5
Rhodosorus_marinus5
Vannella_sp.3
Florenciella_parvula5

I'm not really used to R and I didn't find the right method.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

魂牵梦绕锁你心扉 2025-01-16 09:46:09
df$col1 <- sub("^([^_]+_[^_]+)_.*", "\\1", df$col1, perl = TRUE)
df
                  col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

使用df如下:

df <- read.table(
  text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
  header = TRUE
)
df$col1 <- sub("^([^_]+_[^_]+)_.*", "\\1", df$col1, perl = TRUE)
df
                  col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

With df as follows:

df <- read.table(
  text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
  header = TRUE
)
-黛色若梦 2025-01-16 09:46:09

带有 strsplit 的选项:

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))


# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

另一种方法是使用 stringr 包中的 word

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1

An option with strsplit:

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))


# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

Another way would be to use word from stringr package:

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文