如何在Python中包括两次彼此重叠的多字正则匹配

发布于 2025-02-08 17:47:33 字数 583 浏览 3 评论 0原文

我目前的文本如下：

为了在我们的政治中进行更多的犬儒主义和两极分化，现在没有快速解决这个长期趋势，我同意我们的贸易不仅是公平的，而且不仅是免费的，而且是下一步的经济脱位不会来自海外，这将来自无情的自动化速度，这使很多好的中产阶级工作都过时了，因此我们将不得不建立一个新的社交紧凑型，以确保我们所有的孩子

我想找到匹配的匹配项，在搜索的正则搜索正则

正则搜索的正则是\ b（com（es？| ing）|来）\ b

搜索我要寻找的比赛将是

经济错位不会来自海外

和 海外它将来自不懈的，

因此我设计了一个以下等级，其中包括特定规则 \ w+'？\ w*\ s \ s \ w+'？\ w*\ s \ s \ w+'？\ w*\ s \ b（com（es？| in）|来）\ b \ s \ s \ w+' ？\ w*\ s \ w+'？\ w*\ s \ s \ w+'？\ w*，

但两个结果彼此重叠，最终仅以1重叠，这是第一场比赛。

需要如何更改我的正则表达式以包括重叠结果？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

财迷小姐 2025-02-15 17:47:33

您需要确保

在第一个\ w 和之前使用单词边界，
在正面的lookahead中使用捕获组

并且 - 捕获组量化了三次。

请参阅 regex demo ：

(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))

请参阅 python demo ：

import re
text = r'''for more cynicism and polarization in our politics now there're no quick fixes to this long-term trend i agree our trade should be fair and not just free but the next wave of economic dislocations won't come from overseas it will come from the relentless pace of automation that makes a lot of good middle class jobs obsolete and so we're going to have to forge a new social compact to guarantee all our kids the'''
pattern = r"(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))"
print( re.findall(pattern, text) )

输出：

["economic dislocations won't come from overseas it", 'overseas it will come from the relentless']

You need to make sure

You use a word boundary before the first \w and
Use a capturing group inside a positive lookahead

Besides, you can shorten the pattern if you use a non-capturing group quantified three times.

See the regex demo:

(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))

See the Python demo:

import re
text = r'''for more cynicism and polarization in our politics now there're no quick fixes to this long-term trend i agree our trade should be fair and not just free but the next wave of economic dislocations won't come from overseas it will come from the relentless pace of automation that makes a lot of good middle class jobs obsolete and so we're going to have to forge a new social compact to guarantee all our kids the'''
pattern = r"(?=\b((?:\w+(?:'\w*)*\s){3}(?:com(?:es?|ing)|came)(?:\s\w+(?:'\w*)*){3}))"
print( re.findall(pattern, text) )

Output:

["economic dislocations won't come from overseas it", 'overseas it will come from the relentless']

回复收藏 0 原文

~没有更多了~

关于作者

转身以后

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何在Python中包括两次彼此重叠的多字正则匹配

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如何在Python中包括两次彼此重叠的多字正则匹配

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。