在查看(R/StringR)中使用量词

发布于 2025-01-22 20:53:42 字数 805 浏览 0 评论 0 原文

我想从以下字符串中提取名称 john doe

str <- 'Name: |             |John Doe     |'

我可以做:

library(stringr)
str_extract(str,'(?<=Name: \\|             \\|).*(?=     \\|)')
[1] "John Doe"

但是,这涉及到很多空间,并且当未固定空间数量时,它效果不佳。但是,当我尝试使用量词(+)时,我会收到一个错误:

str_extract(str,'(?<=Name: \\| +\\|).*(?= +\\|)')
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT, context=`(?<=Name: \| +\|).*(?= +\|)`)

其他变体也是如此:

str_extract(str,'(?<=Name: \\|\\s+\\|).*(?=\\s+\\|)') 
str_extract(str,'(?<=Name: \\|\\s{1,}\\|).*(?=\\s{1,}\\|)')

是否有解决方案?

I'd like to extract the name John Doe from the following string:

str <- 'Name: |             |John Doe     |'

I can do:

library(stringr)
str_extract(str,'(?<=Name: \\|             \\|).*(?=     \\|)')
[1] "John Doe"

But that involves typing a lot of spaces, and it doesn't work well when the number of spaces is not fixed. But when I try to use a quantifier (+), I get an error:

str_extract(str,'(?<=Name: \\| +\\|).*(?= +\\|)')
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT, context=`(?<=Name: \| +\|).*(?= +\|)`)

The same goes for other variants:

str_extract(str,'(?<=Name: \\|\\s+\\|).*(?=\\s+\\|)') 
str_extract(str,'(?<=Name: \\|\\s{1,}\\|).*(?=\\s{1,}\\|)')

Is there a solution to this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

愁杀 2025-01-29 20:53:42

怎么样:
首先,我们删除名称
然后我们用空间代替所有特殊字符
最后 str_squish

Library(stringr)

str_squish(str_replace_all( str_remove(str, "Name"), "[^[:alnum:]]", " "))
[1] "John Doe"

How about:
First we remove Name
Then we replace all special characters with space
and finally str_squish it

Library(stringr)

str_squish(str_replace_all( str_remove(str, "Name"), "[^[:alnum:]]", " "))
[1] "John Doe"
弱骨蛰伏 2025-01-29 20:53:42

使用基本R的另一个解决方案:

sub("Name: \\|\\s+\\|(.*\\S)\\s+\\|", "\\1", str)
# [1] "John Doe"

Another solution using base R:

sub("Name: \\|\\s+\\|(.*\\S)\\s+\\|", "\\1", str)
# [1] "John Doe"
篱下浅笙歌 2025-01-29 20:53:42

您也可以使用 \ k 将迄今为止与正则匹配的匹配保持匹配。

Name: \|\h+\|\K.*?(?=\h+\|)

说明

  • 名称:\ | match 名称:|
  • \ h+ \ | 匹配1+空格和 |
  • \ k 忘记到目前为止匹配的
  • 积极的lookahead,断言右侧的更多空间,然后是 |

请参阅a and a R demo.

示例

str <- 'Name: |             |John Doe     |'    
regmatches(str, regexpr("Name: \\|\\h+\\|\\K.*?(?=\\h+\\|)", str, perl=T))

输出

[1] "John Doe"

You might also use the \K to keep what is matched so far out of the regex match.

Name: \|\h+\|\K.*?(?=\h+\|)

Explanation

  • Name: \| match Name: |
  • \h+\| Match 1+ spaces and |
  • \K Forget what is matched so far
  • .*? Match as least as possible chars
  • (?=\h+\|) Positive lookahead, assert 1+ more spaces to the right followed by |

See a regex demo and a R demo.

Example

str <- 'Name: |             |John Doe     |'    
regmatches(str, regexpr("Name: \\|\\h+\\|\\K.*?(?=\\h+\\|)", str, perl=T))

Output

[1] "John Doe"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文