Python 3.7.1 findall() 的行为不符合预期

发布于 2025-01-13 06:13:35 字数 3172 浏览 4 评论 0原文

首先,我知道这不是当前版本的 Python,并且 findall() 的行为从 3.6 开始发生了变化。我不认为这些都是我遇到的问题。我还没有找到任何关于 findall() 自 3.7 以来发生变化的内容。

我已经使用 sub() 而不是 findall() 设计了一个修复程序,但我很好奇为什么我必须首先这样做。

我有一个函数应该检查模式是否存在。如果找到,则应该验证该模式是否先前已定义。目前看起来像这样(带有修复程序和一些调试代码):

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
            formals = []
            text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
#           formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
#               formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

该模式看起来像这样:

SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*'              # straight text replacement

当针对这段测试代码运行时:

part of test 100

它产生以下输出:

所需输出(工作)

这很好。这就是我想要的。但如果我注释掉修复:

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
#           formals = []
#           text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
            formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
                formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

...那么测试会产生以下结果:

在此处输入图像描述

所以 findall() 似乎进行了意外的匹配,尽管我的理解是 sub() 和 findall()应该具有完全相同的匹配行为。

也许我正在滥用 sub()。在这种情况下,我根本不关心替换的结果(我将其保存在这里只是因为我可能想查看它),而只关心它找到我期望的模式。关于 findall() 的工作方式,我是否忽略了什么?

First of all, I know that this is not the current version of Python and that the behavior of findall() was changed from 3.6. I don't believe either of those are issue I'm experiencing. And I haven't been able to find anything about findall() that has changed since 3.7.

I have already devised a fix using sub() instead of findall(), but I'm curious why I had to in the first place.

I have a function that is supposed to check for the presence of a pattern. If found, it's supposed to verify that the pattern has been previously defined. It looks like this at present (with the fix and some debug code):

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
            formals = []
            text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
#           formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
#               formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

The pattern looks like this:

SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*'              # straight text replacement

When run against this piece of test code:

part of test 100

It produces this output:

desired output (working)

Which is fine. It's what I want. But if I comment out the fix:

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
#           formals = []
#           text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
            formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
                formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

...then the test produces this:

enter image description here

So findall() seems to be making an unexpected match, even though my understanding is that sub() and findall() should have exactly the same matching behavior.

Perhaps I'm abusing sub(). In this instance I don't care at all about the result of the substitution (I save it here only because I might want to look at it), but only that it finds the patterns I expect. Is there something I'm overlooking about the way findall() works?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不甘平庸 2025-01-20 06:13:35

TL;DR

使用 (?: ... ) 而不是 ( ... ) 因为 re.findall 为您提供了捕获组整场比赛。

详细

这个问题让我有点困惑,但是我发现了问题所在。

re.findall 文档 说:

结果取决于模式中捕获组的数量。
如果没有组,则返回与整个组匹配的字符串列表
图案。如果正好有一组,则返回字符串列表
匹配该组。如果存在多个组,则返回一个列表
与组匹配的字符串元组。非捕获组不
影响结果的形式。

由于您有一组括号,因此您有一个 captugin 组,这就是 re.findall 返回的内容。它符合您的期望,只是没有返回您想要的结果。

通过使用非捕获括号 (?: ... ) 您将得到您想要的结果:整个匹配项。

IE:

SYM.macLabel = '[?][_A-Z](?:[.]?[_A-Z0-9])*'

TL;DR

Use (?: ... ) instead of ( ... ) because re.findall is giving you the capturing group instead of the whole matches.

Details

This question puzzled me for a bit, but I found the problem.

The documentation for re.findall says:

The result depends on the number of capturing groups in the pattern.
If there are no groups, return a list of strings matching the whole
pattern. If there is exactly one group, return a list of strings
matching that group. If multiple groups are present, return a list of
tuples of strings matching the groups. Non-capturing groups do not
affect the form of the result.

Since you have one set of parentheses, you have one captugin group, and that's what re.findall is returning. It matches what you expect, it just doesn't return what you thought it would.

By using non-capturing parentheses, (?: ... ) you will get the results you want: the whole matches.

I.e.:

SYM.macLabel = '[?][_A-Z](?:[.]?[_A-Z0-9])*'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文