r-根据模式和条件在字符串列中删除子字符串

发布于 2025-01-26 01:44:15 字数 599 浏览 2 评论 0原文

我在数据框架中有一列字符串，我想在第一个之前替换值，以在第一个，即，在第一个空间/开放式括号对之前。字符串

包含

col1 <- c(1, 2, 3, 4)
col2 <- c("a b (ABC DE)", "bcd", "cd ef (CE)", "bcd")
df <- data.frame(col1, col2)
df

括号

  col1       col2
1    1 a b (ABC DE)
2    2        bcd
3    3  cd ef (CE)
4    4        bcd

col1 <- c(1, 2, 3, 4)
col2 <- c("a b", "bcd", "cd ef", "bcd")
df <- data.frame(col1, col2)
df

  col1 col2
1    1  a b
2    2  bcd
3    3 cd ef
4    4  bcd

可能的值，因此在示例中无法手动完成。

原文

I have a column of strings in a data frame where I would like to replace the values to include only the substring before the first " (", i.e., before the first space/open bracket pair. Not all of the strings contain brackets, and I want those to be left as they are.

Example data:

col1 <- c(1, 2, 3, 4)
col2 <- c("a b (ABC DE)", "bcd", "cd ef (CE)", "bcd")
df <- data.frame(col1, col2)
df

Output:

  col1       col2
1    1 a b (ABC DE)
2    2        bcd
3    3  cd ef (CE)
4    4        bcd

The output I'm looking for would be something like this:

col1 <- c(1, 2, 3, 4)
col2 <- c("a b", "bcd", "cd ef", "bcd")
df <- data.frame(col1, col2)
df

Output:

  col1 col2
1    1  a b
2    2  bcd
3    3 cd ef
4    4  bcd

The actual data frame is 40000+ rows with the strings taking many possible values, so it can't be done manually like in the example. I'm not confident at all working with regex/patterns, but accept this may be the most straightforward way to do this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

今天小雨转甜 2025-02-02 01:44:15

基于Stringr的可能解决方案：

library(tidyverse)

df %>% 
  mutate(col2 = str_remove_all(col2, "\\s*\\(.*\\)\\s*"))

#>   col1  col2
#> 1    1   a b
#> 2    2   bcd
#> 3    3 cd ef
#> 4    4   bcd

A possible solution, based on stringr:

library(tidyverse)

df %>% 
  mutate(col2 = str_remove_all(col2, "\\s*\\(.*\\)\\s*"))

#>   col1  col2
#> 1    1   a b
#> 2    2   bcd
#> 3    3 cd ef
#> 4    4   bcd

回复收藏 0 原文

似最初 2025-02-02 01:44:15

这是dplyr方法，

library(dplyr)
library(stringr)

df %>% 
  mutate(col2 = str_replace_all(col2, "\\(.+?\\)", ""))

它返回df：

  col1   col2
1    1   a b 
2    2    bcd
3    3 cd ef 
4    4    bcd

Here's a dplyr method

library(dplyr)
library(stringr)

df %>% 
  mutate(col2 = str_replace_all(col2, "\\(.+?\\)", ""))

Which returns the df:

  col1   col2
1    1   a b 
2    2    bcd
3    3 cd ef 
4    4    bcd

回复收藏 0 原文

朦胧时间 2025-02-02 01:44:15

使用r base gsub

> df$col2 <- gsub("\\s*\\(.*\\)", "", df$col2)
> df
  col1  col2
1    1   a b
2    2   bcd
3    3 cd ef
4    4   bcd

Using R base gsub

> df$col2 <- gsub("\\s*\\(.*\\)", "", df$col2)
> df
  col1  col2
1    1   a b
2    2   bcd
3    3 cd ef
4    4   bcd

回复收藏 0 原文

爱要勇敢去追 2025-02-02 01:44:15

我宁愿使用正则义务而不是子字符串。

transform(df, col2=gsub('\\s+\\(.*', '', x))
#   col1 col2
# 1    1   ab
# 2    2  bcd
# 3    3 cedf
# 4    4  bcd

I'd rather use a regex than substrings.

transform(df, col2=gsub('\\s+\\(.*', '', x))
#   col1 col2
# 1    1   ab
# 2    2  bcd
# 3    3 cedf
# 4    4  bcd

回复收藏 0 原文

~没有更多了~

关于作者

人生戏

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

r-根据模式和条件在字符串列中删除子字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

r-根据模式和条件在字符串列中删除子字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。