Python 3.7.1 findall() 的行为不符合预期

发布于 2025-01-13 06:13:35 字数 3172 浏览 4 评论 0原文

首先，我知道这不是当前版本的 Python，并且 findall() 的行为从 3.6 开始发生了变化。我不认为这些都是我遇到的问题。我还没有找到任何关于 findall() 自 3.7 以来发生变化的内容。

我已经使用 sub() 而不是 findall() 设计了一个修复程序，但我很好奇为什么我必须首先这样做。

我有一个函数应该检查模式是否存在。如果找到，则应该验证该模式是否先前已定义。目前看起来像这样（带有修复程序和一些调试代码）：

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
            formals = []
            text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
#           formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
#               formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

该模式看起来像这样：

SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*'              # straight text replacement

当针对这段测试代码运行时：

它产生以下输出：

这很好。这就是我想要的。但如果我注释掉修复：

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
#           formals = []
#           text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
            formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
                formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

...那么测试会产生以下结果：

所以 findall() 似乎进行了意外的匹配，尽管我的理解是 sub() 和 findall()应该具有完全相同的匹配行为。

也许我正在滥用 sub()。在这种情况下，我根本不关心替换的结果（我将其保存在这里只是因为我可能想查看它），而只关心它找到我期望的模式。关于 findall() 的工作方式，我是否忽略了什么？

原文

First of all, I know that this is not the current version of Python and that the behavior of findall() was changed from 3.6. I don't believe either of those are issue I'm experiencing. And I haven't been able to find anything about findall() that has changed since 3.7.

I have already devised a fix using sub() instead of findall(), but I'm curious why I had to in the first place.

I have a function that is supposed to check for the presence of a pattern. If found, it's supposed to verify that the pattern has been previously defined. It looks like this at present (with the fix and some debug code):

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
            formals = []
            text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
#           formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
#               formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

The pattern looks like this:

SYM.macLabel = '[?][_A-Z]([.]?[_A-Z0-9])*'              # straight text replacement

When run against this piece of test code:

It produces this output:

Which is fine. It's what I want. But if I comment out the fix:

    def _verifyargs(i, end, args):
        '''verify text replacement args'''

        def _findallfix(m):
            formals.append( m.group().upper() )
            return '-xxx- '
            
        # put any formal arguments into a more convenient form for checking

        checkargs = args.keys()
        print( f'checkargs: start={i}, end= {end}, args= {checkargs}' )

        # if there aren't any formal arguments we're still checking for
        # their improper use within the definition body

        while i < end:
            i, text = SRC.fetch( i+1 )
            SRC.setmaster( i )
#           formals = []
#           text = re.sub( SYM.macLabel, _findallfix, text, flags=re.IGNORECASE )
            formals = re.findall( SYM.macLabel, text, flags=re.IGNORECASE )
            print( f'line= {i}, formals= {formals}' )
            for formal in formals:
                formal = formal.upper()
                if not formal in checkargs:
                    UM.undefined( formal )

        SRC.setmaster(end)

...then the test produces this:

So findall() seems to be making an unexpected match, even though my understanding is that sub() and findall() should have exactly the same matching behavior.

Perhaps I'm abusing sub(). In this instance I don't care at all about the result of the substitution (I save it here only because I might want to look at it), but only that it finds the patterns I expect. Is there something I'm overlooking about the way findall() works?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不甘平庸 2025-01-20 06:13:35

TL;DR

使用 (?: ... ) 而不是 ( ... ) 因为 re.findall 为您提供了捕获组整场比赛。

详细

这个问题让我有点困惑，但是我发现了问题所在。

re.findall 文档说：

结果取决于模式中捕获组的数量。
如果没有组，则返回与整个组匹配的字符串列表
图案。如果正好有一组，则返回字符串列表
匹配该组。如果存在多个组，则返回一个列表
与组匹配的字符串元组。非捕获组不
影响结果的形式。

由于您有一组括号，因此您有一个 captugin 组，这就是 re.findall 返回的内容。它符合您的期望，只是没有返回您想要的结果。

通过使用非捕获括号 (?: ... ) 您将得到您想要的结果：整个匹配项。

IE：

SYM.macLabel = '[?][_A-Z](?:[.]?[_A-Z0-9])*'

TL;DR

Use (?: ... ) instead of ( ... ) because re.findall is giving you the capturing group instead of the whole matches.

Details

This question puzzled me for a bit, but I found the problem.

The documentation for re.findall says:

The result depends on the number of capturing groups in the pattern.
If there are no groups, return a list of strings matching the whole
pattern. If there is exactly one group, return a list of strings
matching that group. If multiple groups are present, return a list of
tuples of strings matching the groups. Non-capturing groups do not
affect the form of the result.

Since you have one set of parentheses, you have one captugin group, and that's what re.findall is returning. It matches what you expect, it just doesn't return what you thought it would.

By using non-capturing parentheses, (?: ... ) you will get the results you want: the whole matches.

I.e.:

SYM.macLabel = '[?][_A-Z](?:[.]?[_A-Z0-9])*'

回复收藏 0 原文

~没有更多了~

关于作者

白色秋天

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

Python 3.7.1 findall() 的行为不符合预期

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

TL;DR

详细

TL;DR

Details

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

Python 3.7.1 findall() 的行为不符合预期

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

TL;DR

详细

TL;DR

Details

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。