与正向先行断言匹配的空白如何出现在 Python 正则表达式中的最终匹配字符串中?
为了回答这个问题,我创建了这个Python正则表达式来匹配任何egg
子字符串,后跟一个不属于 http://
开头的 URL 的数字:
>>> r = re.compile('(?:\s(?!http://\S*))egg\d')
然后我将其应用于以下字符串:
>>> a = "a egg1 http://egg2.com egg3 http://www.egg4.org egg5"
结果是:
>>> r.findall(a)
[' egg1', ' egg3', ' egg5']
正则表达式对于许多其他问题都是不正确的,但一个问题除外更多错误:为什么空白出现在结果中?由于我使用了像 (?:\s...)
这样的前瞻断言,难道不应该从结果字符串中取出它吗?
Trying to answer this question, I created this Python regular expression to match any egg
substring followed by a digit that is not part of a URL preceded by http://
:
>>> r = re.compile('(?:\s(?!http://\S*))egg\d')
Then I applied it to the following string:
>>> a = "a egg1 http://egg2.com egg3 http://www.egg4.org egg5"
The result is:
>>> r.findall(a)
[' egg1', ' egg3', ' egg5']
The regular expression is not correct for a lot of other problems but one bugged more: why does the whitespace appears in the result? Since I used a lookahead assertion like (?:\s...)
, shouldn't it be take out of the resulting strings?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
(?:...)
不是一个前瞻断言,它只是一对非捕获的括号(即内部子正则表达式匹配的内容不会进入它自己的组,它只是为了优先而存在)。(?=...)
是一个前瞻断言。(?:...)
isn't a lookahead assertion, it's simply a non-capturing pair of parens (i.e. what is matched by the sub-regex inside doesn't do into its own group, it only exists for precedence).(?=...)
is a lookahead assertion.(?:
不是前瞻,而是非捕获组。因此,它不会创建自己的捕获,但它是完整匹配的一部分。(?:
is not a lookahead, but a non-capturing group. As such, it doesn't create its own capture, but it is part of the full match.