如何在Python中包括两次彼此重叠的多字正则匹配

发布于 2025-02-08 17:47:33 字数 583 浏览 3 评论 0原文

我目前的文本如下:

为了在我们的政治中进行更多的犬儒主义和两极分化,现在没有快速解决这个长期趋势,我同意我们的贸易不仅是公平的,而且不仅是免费的,而且是下一步的经济脱位不会来自海外,这将来自无情的自动化速度,这使很多好的中产阶级工作都过时了,因此我们将不得不建立一个新的社交紧凑型,以确保我们所有的孩子

我想找到匹配的匹配项,在搜索的正则搜索正则

正则搜索的正则是\ b(com(es?| ing)|来)\ b

搜索 我要寻找的比赛将是

经济错位不会来自海外

海外它将来自不懈的

因此我设计了一个以下等级,其中包括特定规则 \ w+'?\ w*\ s \ s \ w+'?\ w*\ s \ s \ w+'?\ w*\ s \ b(com(es?| in)|来)\ b \ s \ s \ w+' ?\ w*\ s \ w+'?\ w*\ s \ s \ w+'?\ w*

但两个结果彼此重叠,最终仅以1重叠,这是第一场比赛。

需要如何更改我的正则表达式以包括重叠结果?

I currently have a text as follows:

for more cynicism and polarization in our politics now there're no quick fixes to this long-term trend i agree our trade should be fair and not just free but the next wave of economic dislocations won't come from overseas it will come from the relentless pace of automation that makes a lot of good middle class jobs obsolete and so we're going to have to forge a new social compact to guarantee all our kids the

so I would like to find matches that include 3 words both in front and behind the searched regex

The regex that I would like to search is \b(com(es?|ing)|came)\b

so the matches that I will be looking for will be

economic dislocations won't come from overseas it

and
overseas it will come from the relentless

so I've devised a regex that includes the specific rules
\w+'?\w*\s\w+'?\w*\s\w+'?\w*\s\b(com(es?|ing)|came)\b\s\w+'?\w*\s\w+'?\w*\s\w+'?\w*

but the two results overlap with each other and end up just with 1, which is the first match.

How do need to change my regex to include overlapping results?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

财迷小姐 2025-02-15 17:47:33

您需要确保

  • 在第一个\ w 之前使用单词边界,
  • 在正面的lookahead中使用捕获组

并且 - 捕获组量化了三次。

请参阅 regex demo

(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))

请参阅 python demo

import re
text = r'''for more cynicism and polarization in our politics now there're no quick fixes to this long-term trend i agree our trade should be fair and not just free but the next wave of economic dislocations won't come from overseas it will come from the relentless pace of automation that makes a lot of good middle class jobs obsolete and so we're going to have to forge a new social compact to guarantee all our kids the'''
pattern = r"(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))"
print( re.findall(pattern, text) )

输出:

["economic dislocations won't come from overseas it", 'overseas it will come from the relentless']

You need to make sure

  • You use a word boundary before the first \w and
  • Use a capturing group inside a positive lookahead

Besides, you can shorten the pattern if you use a non-capturing group quantified three times.

See the regex demo:

(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))

See the Python demo:

import re
text = r'''for more cynicism and polarization in our politics now there're no quick fixes to this long-term trend i agree our trade should be fair and not just free but the next wave of economic dislocations won't come from overseas it will come from the relentless pace of automation that makes a lot of good middle class jobs obsolete and so we're going to have to forge a new social compact to guarantee all our kids the'''
pattern = r"(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))"
print( re.findall(pattern, text) )

Output:

["economic dislocations won't come from overseas it", 'overseas it will come from the relentless']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文