Python 3.7.1 findall() 的行为不符合预期
首先,我知道这不是当前版本的 Python,并且 findall() 的行为从 3.6 开始发生了变化。我不认为这些都是我遇到的问题。我还没有找到任何关于 findall() 自 3.7 以来发生变化的内容。
我已经使用 sub() 而不是 findall() 设计了一个修复程序,但我很好奇为什么我必须首先这样做。
我有一个函数应该检查模式是否存在。如果找到,则应该验证该模式是否先前已定义。目前看起来像这样(带有修复程序和一些调试代码):
def _verifyargs(i, end, args):
'''verify text replacement args'''
def _findallfix(m):
formals.append( m.group().upper() )
return '-xxx- '
# put any formal arguments into a more convenient form for checking
checkargs = args.keys()
print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )
# if there aren't any formal arguments we're still checking for
# their improper use within the definition body
while i < end:
i, text = SRC.fetch( i+1 )
SRC.setmaster( i )
formals = []
text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
# formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
print( f'line= {i}, formals= {formals}' )
for formal in formals:
# formal = formal.upper()
if not formal in checkargs:
UM.undefined( formal )
SRC.setmaster(end)
该模式看起来像这样:
SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*' # straight text replacement
当针对这段测试代码运行时:
它产生以下输出:
这很好。这就是我想要的。但如果我注释掉修复:
def _verifyargs(i, end, args):
'''verify text replacement args'''
def _findallfix(m):
formals.append( m.group().upper() )
return '-xxx- '
# put any formal arguments into a more convenient form for checking
checkargs = args.keys()
print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )
# if there aren't any formal arguments we're still checking for
# their improper use within the definition body
while i < end:
i, text = SRC.fetch( i+1 )
SRC.setmaster( i )
# formals = []
# text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
print( f'line= {i}, formals= {formals}' )
for formal in formals:
formal = formal.upper()
if not formal in checkargs:
UM.undefined( formal )
SRC.setmaster(end)
...那么测试会产生以下结果:
所以 findall() 似乎进行了意外的匹配,尽管我的理解是 sub() 和 findall()应该具有完全相同的匹配行为。
也许我正在滥用 sub()。在这种情况下,我根本不关心替换的结果(我将其保存在这里只是因为我可能想查看它),而只关心它找到我期望的模式。关于 findall() 的工作方式,我是否忽略了什么?
First of all, I know that this is not the current version of Python and that the behavior of findall() was changed from 3.6. I don't believe either of those are issue I'm experiencing. And I haven't been able to find anything about findall() that has changed since 3.7.
I have already devised a fix using sub() instead of findall(), but I'm curious why I had to in the first place.
I have a function that is supposed to check for the presence of a pattern. If found, it's supposed to verify that the pattern has been previously defined. It looks like this at present (with the fix and some debug code):
def _verifyargs(i, end, args):
'''verify text replacement args'''
def _findallfix(m):
formals.append( m.group().upper() )
return '-xxx- '
# put any formal arguments into a more convenient form for checking
checkargs = args.keys()
print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )
# if there aren't any formal arguments we're still checking for
# their improper use within the definition body
while i < end:
i, text = SRC.fetch( i+1 )
SRC.setmaster( i )
formals = []
text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
# formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
print( f'line= {i}, formals= {formals}' )
for formal in formals:
# formal = formal.upper()
if not formal in checkargs:
UM.undefined( formal )
SRC.setmaster(end)
The pattern looks like this:
SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*' # straight text replacement
When run against this piece of test code:
It produces this output:
Which is fine. It's what I want. But if I comment out the fix:
def _verifyargs(i, end, args):
'''verify text replacement args'''
def _findallfix(m):
formals.append( m.group().upper() )
return '-xxx- '
# put any formal arguments into a more convenient form for checking
checkargs = args.keys()
print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )
# if there aren't any formal arguments we're still checking for
# their improper use within the definition body
while i < end:
i, text = SRC.fetch( i+1 )
SRC.setmaster( i )
# formals = []
# text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
print( f'line= {i}, formals= {formals}' )
for formal in formals:
formal = formal.upper()
if not formal in checkargs:
UM.undefined( formal )
SRC.setmaster(end)
...then the test produces this:
So findall() seems to be making an unexpected match, even though my understanding is that sub() and findall() should have exactly the same matching behavior.
Perhaps I'm abusing sub(). In this instance I don't care at all about the result of the substitution (I save it here only because I might want to look at it), but only that it finds the patterns I expect. Is there something I'm overlooking about the way findall() works?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
TL;DR
使用
(?: ... )
而不是( ... )
因为re.findall
为您提供了捕获组整场比赛。详细
这个问题让我有点困惑,但是我发现了问题所在。
re.findall
文档 说:由于您有一组括号,因此您有一个 captugin 组,这就是
re.findall
返回的内容。它符合您的期望,只是没有返回您想要的结果。通过使用非捕获括号
(?: ... )
您将得到您想要的结果:整个匹配项。IE:
TL;DR
Use
(?: ... )
instead of( ... )
becausere.findall
is giving you the capturing group instead of the whole matches.Details
This question puzzled me for a bit, but I found the problem.
The documentation for
re.findall
says:Since you have one set of parentheses, you have one captugin group, and that's what
re.findall
is returning. It matches what you expect, it just doesn't return what you thought it would.By using non-capturing parentheses,
(?: ... )
you will get the results you want: the whole matches.I.e.: