当前位置：文江博客话题详情

正则：替代。*表达式定义单词序列

发布于 2025-02-10 18:22:31 字数 707 浏览 1 评论 0 原文

我目前正在尝试根据其包含的单词和订单来匹配特定的句子。我主要是根据基于此结构的lookahead断言来做到这一点：

[^＆gt;。\ \]]*（？=“所需的单词）[^＆lt;。\ \]]*，

所以我是在判决中谈论假期的句子

例如， ;。\ \]]*（？=（[vv]阳离子。*[mm] alledives））[^＆lt;。\ \]]*使用。*引起问题，因为该词“恶意人”也可以出现在以后的句子中（示例错误）

我的解决方案以使用表达式（ [，'（）（）\\]*\ w+）{0，x} \ s*而不是。在相同的句子中，它们之间的最大x单词在此结构中更改为：

[^＆gt;。\]] *\ w+）{0，x} \ s*[mm] alledives））[^＆lt;。\ \]]*（示例正确）

不幸的是，如果设置为{0，x}范围较高，则该表达式在计算上很密集，并导致灾难性回溯。

您还有其他建议如何寻找包含特定单词的句子？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世界和平 2025-02-17 18:22:31

您可以尝试这样的事情。

(?:^|\.)\s*([^.]*[Vv]acation[^.]*[Mm]aledives[^.]*(?:\.|$))

见演示。

https://regex101.com/r/97uvch/1

lookahead不需要。它会减慢事情的速度。

You can try something like this.

(?:^|\.)\s*([^.]*[Vv]acation[^.]*[Mm]aledives[^.]*(?:\.|$))

See demo.

https://regex101.com/r/97UvCH/1

There is no need for lookahead .It will slow things down.

回复收藏 0 原文

离旧人 2025-02-17 18:22:31

您的模式很容易出现灾难性的回溯，因为 {0,3} 重复零件中有嵌套的量词，并且在模式开始时也有领先的可选量词。

python re 不支持所有格量化词或原子组，但是您可以模仿lookahead断言中使用捕获组的捕获组，然后在第一部分对第一部分进行backe to to to n of lookahead主张中，然后使用对该组的反向表示。减少回溯。

但是，使用量化器 {0,200} 的第二部分不应该是原子，因为您要允许回溯以在匹配恶性员之前符合可变数量的单词。

因此，量词的数量越高，可以探索的路径越多。

(?<!\S)(?=([^<>.\]]*[Vv]acation\b))\1(?:[,'`´() \\]+\w+){0,200}\s*[Mm]aledives\b[^<.\]]*

模式匹配：

？
（（捕获组1 [^＆lt;＆gt;。\]]*[vv] acation \ b 匹配列出的可选chars，然后匹配度假，然后匹配一个单词boundard ）关闭组1
）关闭LookAhead
\ 1 匹配回归与组1（在LookAhead中匹配）
（？：[，[，' ]+\ w+）{0,200} 重复0-n乘以字符类中的一个或多个字符，然后是1+字字符
\ s*[mm] alledives 然后maledives
类中列出的任何字符

[^＆lt;。\ \]]*可选地匹配了字符 “ rel =“ nofollow noreferrer”> REGEX DEMO 。

另一个选项可以是使用原子组（

(?<!\S)(?>[^<>.\]]*[Vv]acation\b)(?:[,'`´() \\]+\w+){0,3}\s*[Mm]aledives[^<.\]]*

？ gLxbcUBiKhKgQ0CHybfd/v3zy2b5iRTmai7pWiwItgZVz@cRYIYikNXAFNghHg1N@G4WjYXI/GMY8TTvJXZmKqSCFhk985jLpnT1s1mEEnKddXnWj537vvN9/4a6TjBep0DJTpXTeoOEDxpz3DoMhjIUBBRpLCeXOMoYbcDkudQZzBPKRS6/becRwPaWl0HoFarPWGiaIjyA@6E8AGKxiGHCzXQE7CQ7joZX/i/z9tH6lRXLlwIcAko5gImdoZeOxEDTNoVKU@6RGApxBhTZzX7XVUW3zcKZF/gLZuwz7j1LbPy1yQP/8@Fdr8ONk3mr7VlVbYRCxwipDYTO7xDNlMk@EuxGmBy6KGKvrVw" rel="nofollow noreferrer">Python demo

Your pattern is prone to catastrophic backtracking as there are nested quantifiers in the {0,3} repeating part and there are also leading optional quantifiers at the start of the pattern.

Python re does not support possessive quantifiers or atomic groups, but you can mimic that using a capture group in a lookahead assertion, and then use the backreference to that group when the assertion is true for the first part to reduce the backtracking a bit.

But the second part with the quantifier {0,200} should not be atomic because you want to allow backtracking to fit a variable number of words before matching maledives.

So the higher the number for the quantifier will be, the more possible paths are there to explore.

(?<!\S)(?=([^<>.\]]*[Vv]acation\b))\1(?:[,'`´() \\]+\w+){0,200}\s*[Mm]aledives\b[^<.\]]*

The pattern matches:

(?<!\S) Assert a whitespace boundary to the left
(?= Positive lookahead assertion, assert what is to the right is
- ( Capture group 1
  - [^<>.\]]*[Vv]acation\b Match optional chars other than the listed and then match vacation followed by a word boundary
- ) Close group 1
) Close the lookahead
\1 Match a backreference to group 1 (that is matched in the lookahead)
(?:[,'`´() \\]+\w+){0,200} Repeat 0-n times one or more chars from the character class and then 1+ word characters
\s*[Mm]aledives Match optional whitespace chars and then maledives
[^<.\]]* Optionally match any character other than the listed in the character class

See a regex demo.

Another option could be using the PyPi regex module with an atomic group (?> for the first part:

(?<!\S)(?>[^<>.\]]*[Vv]acation\b)(?:[,'`´() \\]+\w+){0,3}\s*[Mm]aledives[^<.\]]*

See a Python demo

回复收藏 0 原文

~没有更多了~

关于作者

谁人与我共长歌

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

正则：替代。*表达式定义单词序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

冰魂雪魄

qq_Wl4Sbi

柳家齐

无法言说的痛

魄砕の薆

盗琴音

友情链接

正则：替代。*表达式定义单词序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

冰魂雪魄

qq_Wl4Sbi

柳家齐

无法言说的痛

魄砕の薆

盗琴音

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。