Python 中两个非常接近的带有前瞻断言的正则表达式 - 为什么 re.split() 的行为不同?
我试图回答这个问题,其中OP具有以下字符串:
"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
并希望将其拆分以获得以下列表:
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
我尝试过通过在正则表达式 (?=path:)
中使用简单的前瞻断言来解决该问题。好吧,它不起作用:
>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']
但是,在 这个答案 ,回答者通过在前瞻断言之前添加一个空格来使其工作:
>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
为什么正则表达式与空格一起工作?为什么没有空格就不起作用?
I was trying to anser this question where the OP has the following string:
"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
and wants to split it to obtain the following list:
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
I tried to solve it by using a simple lookahead assertion in a regex, (?=path:)
. Well, it did not work:
>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']
However, in this answer, the answerer got it working by preceding the lookahead assertion with a whitespace:
>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
Why did the regex work with the whitespace? Why did it not work without the whitespace?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Python 的
re.split()
有一个记录在案的限制:它不能在零长度匹配上拆分。因此,分割仅适用于增加的空间。Python's
re.split()
has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.