如何从r中与多个逗号分离的向量中提取特定文本

发布于 2025-02-06 19:09:10 字数 535 浏览 4 评论 0原文

这是我的第一篇文章,我对R World是相对较新的文章,因此我希望我尊重地将问题发布到网站上。我搜索这个,但我无法提出一些有效的效率。

我有一个具有这样一个结构的列:

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport", "apple, banana, kiwi, watermelon", "Earth, Mars, Jupiter").

我想做的是我想创建一个将根据col1的某些元素构建的新列。

如果第一个单元格具有2个逗号,那么我想在第一个和第二个逗号之间提取元素,然后将其写入新列中的第一个单元格。如果下一个单元格有3个逗号,那么我想在第二和第三逗号之间提取元素,然后将其写入新列中的第二个单元格,依此类推。

从COL1的示例可以看出,我没有按逗号数量的顺序进行细胞,因此有时在以下细胞中可能会再次发生三凸位分隔的细胞结构。我也需要考虑这一点。

在这方面,你能帮我吗?

提前感谢您的帮助!

This is my first post and I am relatively new to R world so I hope I post my question respectfully to the website. I search for this but I could not come up with something efficient.

I have a column that has such a structure:

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport", "apple, banana, kiwi, watermelon", "Earth, Mars, Jupiter").

what I would like to do is that I would like to create a new column that is going to be built based on certain elements of the col1.

If the first cell has 2 commas, then I would like to extract the element between the first and the second comma and write it to the first cell in the new column. If the next cell has 3 commas, then I would like to extract the element between the second and third comma and write it to the second cell in the new column and so on.

As can be seen from the example of col1, I have cells not in order of the number of commas so sometimes a three-comma-separated cell structure might occur again in the following cells. I need to account for that too.

Could you please help me in this regard?

Your help is much appreciated in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

雨后咖啡店 2025-02-13 19:09:13

那以下呢?

library(tidyverse)

df %>% 
 mutate(col2 = str_split(col1, "\\s*,\\s*") %>%
   map_chr(~ if (length(.x) %in% 1:2) {.x[length(.x)]} 
      else {.x[length(.x) - 1]}))

#>                              col1   col2
#> 1         book, pencil,eraser,pen eraser
#> 2                        book,pen    pen
#> 3                 music,art,sport    art
#> 4 apple, banana, kiwi, watermelon   kiwi
#> 5            Earth, Mars, Jupiter   Mars

What about the following?

library(tidyverse)

df %>% 
 mutate(col2 = str_split(col1, "\\s*,\\s*") %>%
   map_chr(~ if (length(.x) %in% 1:2) {.x[length(.x)]} 
      else {.x[length(.x) - 1]}))

#>                              col1   col2
#> 1         book, pencil,eraser,pen eraser
#> 2                        book,pen    pen
#> 3                 music,art,sport    art
#> 4 apple, banana, kiwi, watermelon   kiwi
#> 5            Earth, Mars, Jupiter   Mars
往昔成烟 2025-02-13 19:09:13

这是一种直接的正则解决方案,将前词的前词提取到一个新列中:

df %>%
  mutate(col2 = str_extract(col1, "\\w+(?=,[^,]+$)"))
                             col1   col2
1         book, pencil,eraser,pen eraser
2                        book,pen   book
3                 music,art,sport    art
4 apple, banana, kiwi, watermelon   kiwi
5            Earth, Mars, Jupiter   Mars 

数据:

df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"))

Here's a straightforward regex solution to extract the pre-ultimate word into a new column:

df %>%
  mutate(col2 = str_extract(col1, "\\w+(?=,[^,]+$)"))
                             col1   col2
1         book, pencil,eraser,pen eraser
2                        book,pen   book
3                 music,art,sport    art
4 apple, banana, kiwi, watermelon   kiwi
5            Earth, Mars, Jupiter   Mars 

Data:

df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"))
怪我入戏太深 2025-02-13 19:09:13

您可以使用strsplit。我这个情况n是3。

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport")
strsplit(df$col1, ',')[[1]][3]

[1] "eraser"

编辑
如果我正确理解您的问题,您可以做这样的事情:

 df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"), stringsAsFactors = F)
 df$col2 <- lapply(df$col1, FUN = function(x) {strsplit(x, ",")[[1]][stringr::str_count(x, ",")]})
 df
                    col1   col2
1 book,pencil,eraser,pen eraser
2               book,pen   book
3        music,art,sport    art

You could use strsplit. I this case n is 3.

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport")
strsplit(df$col1, ',')[[1]][3]

[1] "eraser"

EDIT
If I understand your question correctly, you could do something like this:

 df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"), stringsAsFactors = F)
 df$col2 <- lapply(df$col1, FUN = function(x) {strsplit(x, ",")[[1]][stringr::str_count(x, ",")]})
 df
                    col1   col2
1 book,pencil,eraser,pen eraser
2               book,pen   book
3        music,art,sport    art
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文