r-根据模式和条件在字符串列中删除子字符串
我在数据框架中有一列字符串,我想在第一个之前替换值,以在第一个,即,在第一个空间/开放式括号对之前。字符串
包含
col1 <- c(1, 2, 3, 4)
col2 <- c("a b (ABC DE)", "bcd", "cd ef (CE)", "bcd")
df <- data.frame(col1, col2)
df
括号
col1 col2
1 1 a b (ABC DE)
2 2 bcd
3 3 cd ef (CE)
4 4 bcd
col1 <- c(1, 2, 3, 4)
col2 <- c("a b", "bcd", "cd ef", "bcd")
df <- data.frame(col1, col2)
df
col1 col2
1 1 a b
2 2 bcd
3 3 cd ef
4 4 bcd
可能的值,因此在示例中无法手动完成。
I have a column of strings in a data frame where I would like to replace the values to include only the substring before the first " ("
, i.e., before the first space/open bracket pair. Not all of the strings contain brackets, and I want those to be left as they are.
Example data:
col1 <- c(1, 2, 3, 4)
col2 <- c("a b (ABC DE)", "bcd", "cd ef (CE)", "bcd")
df <- data.frame(col1, col2)
df
Output:
col1 col2
1 1 a b (ABC DE)
2 2 bcd
3 3 cd ef (CE)
4 4 bcd
The output I'm looking for would be something like this:
col1 <- c(1, 2, 3, 4)
col2 <- c("a b", "bcd", "cd ef", "bcd")
df <- data.frame(col1, col2)
df
Output:
col1 col2
1 1 a b
2 2 bcd
3 3 cd ef
4 4 bcd
The actual data frame is 40000+ rows with the strings taking many possible values, so it can't be done manually like in the example. I'm not confident at all working with regex/patterns, but accept this may be the most straightforward way to do this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基于
Stringr
的可能解决方案:A possible solution, based on
stringr
:这是
dplyr
方法,它返回
df
:Here's a
dplyr
methodWhich returns the
df
:使用r base
gsub
Using R base
gsub
我宁愿使用正则义务而不是子字符串。
I'd rather use a regex than substrings.