获取字符向量中每个元素的第三个字
我有以下字符向量称为strains
:
head(strains, 10)
[1] "Lactobacillus gasseri APC678" "Lactobacillus gasseri DSM 20243"
[3] "Bifidobacterium angulatum B677" "Bifidobacterium breve Reuter S1"
[5] "Lactobacillus reuteri F275" "Lactobacillus acidophilus L917"
[7] "Lactobacillus acidophilus 4357" "Bifidobacterium pseudocatenulatum B1279"
[9] "Bifidobacterium longum subsp. infantis JCM 1210" "Clostridium difficile 43594"
我要获得的是一个矢量,仅适用于应变中每个元素的第三个字。例如,在称为“乳杆菌Gasseri APC678”的元素中,我只想保留“ APC678”。
我所做的是以下内容:
library(tidyvese)
lapply(strains %>% str_split(" "), '[', 3) %>% unlist
我想要的工作,正如您在我的代码输出中看到的那样:
[1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357" "B1279" "subsp." "43594" "subsp." "F275" "1SL4" "JCM"
[15] "JCM" "AM63" "DSM" "L917" "61D" "Bb14" "AM63" "VPI"
但是,我正在寻找更优雅或简洁的方式做同样的事情,也许使用正直或类似的东西。
这是我数据的dput
:
strains <- c("Lactobacillus gasseri APC678", "Lactobacillus gasseri DSM 20243",
"Bifidobacterium angulatum B677", "Bifidobacterium breve Reuter S1",
"Lactobacillus reuteri F275", "Lactobacillus acidophilus L917",
"Lactobacillus acidophilus 4357", "Bifidobacterium pseudocatenulatum B1279",
"Bifidobacterium longum subsp. infantis JCM 1210", "Clostridium difficile 43594"
)
I have the following character vector called strains
:
head(strains, 10)
[1] "Lactobacillus gasseri APC678" "Lactobacillus gasseri DSM 20243"
[3] "Bifidobacterium angulatum B677" "Bifidobacterium breve Reuter S1"
[5] "Lactobacillus reuteri F275" "Lactobacillus acidophilus L917"
[7] "Lactobacillus acidophilus 4357" "Bifidobacterium pseudocatenulatum B1279"
[9] "Bifidobacterium longum subsp. infantis JCM 1210" "Clostridium difficile 43594"
What I want to get is a vector with just the 3rd word for each element in the strains. For example, in the element called "Lactobacillus gasseri APC678", I would like to just keep "APC678".
What I did is the following :
library(tidyvese)
lapply(strains %>% str_split(" "), '[', 3) %>% unlist
Which did the work I want, as you can see in the output my code gives :
[1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357" "B1279" "subsp." "43594" "subsp." "F275" "1SL4" "JCM"
[15] "JCM" "AM63" "DSM" "L917" "61D" "Bb14" "AM63" "VPI"
However, I'm looking for a more elegant or concise way to do the same, maybe using regex or something alike.
Here is the dput
of my data :
strains <- c("Lactobacillus gasseri APC678", "Lactobacillus gasseri DSM 20243",
"Bifidobacterium angulatum B677", "Bifidobacterium breve Reuter S1",
"Lactobacillus reuteri F275", "Lactobacillus acidophilus L917",
"Lactobacillus acidophilus 4357", "Bifidobacterium pseudocatenulatum B1279",
"Bifidobacterium longum subsp. infantis JCM 1210", "Clostridium difficile 43594"
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
从
Stringr
软件包中有一个非常简单的Word
函数,而无需使用Regex。There's a very simple
word
function from thestringr
package for this without the need to use regex.您可以使用
Stringr
软件包:You can use
stringr
package:使用基本R和REGEX:
使用
data.table
:With Base R and regex:
With
data.table
:基于
stringr:Match
并捕获组的另一个可能的解决方案:Another possible solution, based on
stringr:match
and capture groups: