从 Python 中的正则表达式模式获取多个匹配项

发布于 2024-11-17 05:26:25 字数 1288 浏览 3 评论 0原文

我正在编写一个正则表达式,以类似于 shell 参数的方式解析参数,使用空格和带引号的字符串作为分隔符,以及反斜杠转义。这似乎适用于 RegexPal

(?:(["'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)

这是一个更易读的版本:

(?:(["'])(?:        # Match a double or single quote followed by
     \\(?:\\\\)?\1  #   an odd number of backslashes, then the same quote
    |\\\\           #   or two backslashes
    |.              #   or anything else  
    )*?\1           # any number of times (lazily) followed by the same quote,
|(?:                # OR
     \\(?:\\\\)?\s  #   an odd number of backslashes, then whitespace
    |\\\\           #   or two backslashes
    |\S             #   or any non-whitespace
 )+                 # any number of times.
)

我尝试使用 re 将其放入 Python 中.findall,但输出是无意义的:

>>> re.findall(
... r"(?:([\"'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)",
... r'the quick brown\ fox jumps "over the" lazy\\ dog')
['', '', '', '', '"', '', '']

另一方面,RegexPal 显示了正确的结果:

[the] [quick] [brown\ fox] [jumps] ["over the"] [lazy\\] [dog]

我是否忘记以某种方式为 Python 格式化模式?或者Python是否以某种方式不同地解释正则表达式?我不知道为什么唯一的非空匹配是双引号,并且我已经确认该模式本身按其应有的方式工作。

I'm writing a regular expression to parse arguments in a fashion similar to shell arguments, with spaces and quoted strings as the delimiters, as well as backslash escaping. This seems to work on RegexPal:

(?:(["'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)

Here is a more readable version of that:

(?:(["'])(?:        # Match a double or single quote followed by
     \\(?:\\\\)?\1  #   an odd number of backslashes, then the same quote
    |\\\\           #   or two backslashes
    |.              #   or anything else  
    )*?\1           # any number of times (lazily) followed by the same quote,
|(?:                # OR
     \\(?:\\\\)?\s  #   an odd number of backslashes, then whitespace
    |\\\\           #   or two backslashes
    |\S             #   or any non-whitespace
 )+                 # any number of times.
)

I've tried putting this into Python with re.findall, but the output is nonsense:

>>> re.findall(
... r"(?:([\"'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)",
... r'the quick brown\ fox jumps "over the" lazy\\ dog')
['', '', '', '', '"', '', '']

RegexPal, on the other hand, shows the correct result:

[the] [quick] [brown\ fox] [jumps] ["over the"] [lazy\\] [dog]

Am I forgetting to format the pattern a certain way for Python? Or does Python interpret regex differently in some way? I have no idea why the only non-empty match would be a double-quote, and I've confirmed that the pattern itself works the way it should.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浅忆 2024-11-24 05:26:25

看起来一切都在非捕获组内。所以你会得到匹配,只是没有匹配的内容。

It looks like everything is inside a non-capturing group. So you get matches, just no matching content.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文