正则表达式环顾功能,中间有不相关的文本

发布于 2025-01-16 10:36:25 字数 828 浏览 3 评论 0原文

我的文本应该包含tip然后top,此外,如果tap位于tip和top之间(按这个顺序,即tip...tap...top),那么tip和tap之间不能有其他顶部(按这个顺序,即tip ...顶部...点击...顶部是禁止的)。

一些示例

1. "tip tip top tip tip" TRUE
2. "top tip tup tip tap top" TRUE
3. "tip top tap tap top" FALSE
4. "tip tup top tap tap top" FALSE
5. "tip top tap tap tip" TRUE

我尝试过使用环视的

condition = (tip.*top) & (tip(?!.*top).*tap.*top)
str_detect("mytext", condition)

,例如,但它不起作用。

这是一个工作示例:

mytext = c("tip tip top tip tip" , "top tip tup tip tap top" ,
           "tip top tap tap top" , "tip tup top tap tap top" , "tip top tap tap tip" )
condition = "(tip.*top) & (tip(?!.*top).*tap.*top)"
str_detect(mytext, condition)

它给出的

[1] FALSE FALSE FALSE FALSE FALSE

不是 TTFFT

My text should contain tip then top, Additionally, if tap is between tip and top (in that order, ie tip...tap...top), then no other top can be between tip and tap (in that order ie tip...top...tap...top is forbidden).

Some examples

1. "tip tip top tip tip" TRUE
2. "top tip tup tip tap top" TRUE
3. "tip top tap tap top" FALSE
4. "tip tup top tap tap top" FALSE
5. "tip top tap tap tip" TRUE

I have tried using lookarounds, eg

condition = (tip.*top) & (tip(?!.*top).*tap.*top)
str_detect("mytext", condition)

but it doesnt work.

Here is a working example:

mytext = c("tip tip top tip tip" , "top tip tup tip tap top" ,
           "tip top tap tap top" , "tip tup top tap tap top" , "tip top tap tap tip" )
condition = "(tip.*top) & (tip(?!.*top).*tap.*top)"
str_detect(mytext, condition)

which gives

[1] FALSE FALSE FALSE FALSE FALSE

rather than T T F F T

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

年少掌心 2025-01-23 10:36:25

如果我们这样做怎么办:

mytext = c("tip tip top tip tip" , "top tip tup tip tap top" ,
 "tip top tap tap top" , "tip tup top tap tap top" , "tip top tap tap tip" )
str_detect(mytext, "tip.*top") & !str_detect(mytext, "tip.*top.*tap.*top")

TRUE
TRUE
FALSE
FALSE
TRUE

What if we do this:

mytext = c("tip tip top tip tip" , "top tip tup tip tap top" ,
 "tip top tap tap top" , "tip tup top tap tap top" , "tip top tap tap tip" )
str_detect(mytext, "tip.*top") & !str_detect(mytext, "tip.*top.*tap.*top")

TRUE
TRUE
FALSE
FALSE
TRUE
傾城如夢未必闌珊 2025-01-23 10:36:25

@KevinDialdestoro 给出了我会使用的解决方案,但如果您真的希望将所有内容都包含在一个正则表达式中,这里是将他的解决方案翻译成正则表达式语言:

str_detect(mytext, "(?=.*tip.*top)(?!.*tip.*top.*tap.*top)")

(?=...) 部分是“非消耗性” Lookahead”,并且 (?!...) 部分是否定。

编辑添加:我的第一篇文章写错了。我认为现在已经解决了,但这就是为什么凯文的解决方案更好:它显然是正确的。

@KevinDialdestoro gave the solution I would use, but if you really want it all in one regexp, here's his solution translated into regex language:

str_detect(mytext, "(?=.*tip.*top)(?!.*tip.*top.*tap.*top)")

The (?=...) part is a "non-consuming lookahead", and the (?!...) part is a negation.

EDITED TO ADD: My first posting got it wrong. I think it's fixed now, but that's why Kevin's solution is better: it's obviously correct.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文