从 Python 中的正则表达式模式获取多个匹配项
我正在编写一个正则表达式,以类似于 shell 参数的方式解析参数,使用空格和带引号的字符串作为分隔符,以及反斜杠转义。这似乎适用于 RegexPal:
(?:(["'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)
这是一个更易读的版本:
(?:(["'])(?: # Match a double or single quote followed by
\\(?:\\\\)?\1 # an odd number of backslashes, then the same quote
|\\\\ # or two backslashes
|. # or anything else
)*?\1 # any number of times (lazily) followed by the same quote,
|(?: # OR
\\(?:\\\\)?\s # an odd number of backslashes, then whitespace
|\\\\ # or two backslashes
|\S # or any non-whitespace
)+ # any number of times.
)
我尝试使用 re 将其放入 Python 中.findall,但输出是无意义的:
>>> re.findall(
... r"(?:([\"'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)",
... r'the quick brown\ fox jumps "over the" lazy\\ dog')
['', '', '', '', '"', '', '']
另一方面,RegexPal 显示了正确的结果:
[the] [quick] [brown\ fox] [jumps] ["over the"] [lazy\\] [dog]
我是否忘记以某种方式为 Python 格式化模式?或者Python是否以某种方式不同地解释正则表达式?我不知道为什么唯一的非空匹配是双引号,并且我已经确认该模式本身按其应有的方式工作。
I'm writing a regular expression to parse arguments in a fashion similar to shell arguments, with spaces and quoted strings as the delimiters, as well as backslash escaping. This seems to work on RegexPal:
(?:(["'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)
Here is a more readable version of that:
(?:(["'])(?: # Match a double or single quote followed by
\\(?:\\\\)?\1 # an odd number of backslashes, then the same quote
|\\\\ # or two backslashes
|. # or anything else
)*?\1 # any number of times (lazily) followed by the same quote,
|(?: # OR
\\(?:\\\\)?\s # an odd number of backslashes, then whitespace
|\\\\ # or two backslashes
|\S # or any non-whitespace
)+ # any number of times.
)
I've tried putting this into Python with re.findall, but the output is nonsense:
>>> re.findall(
... r"(?:([\"'])(?:\\(?:\\\\)?\1|\\\\|.)*?\1|(?:\\(?:\\\\)?\s|\\\\|\S)+)",
... r'the quick brown\ fox jumps "over the" lazy\\ dog')
['', '', '', '', '"', '', '']
RegexPal, on the other hand, shows the correct result:
[the] [quick] [brown\ fox] [jumps] ["over the"] [lazy\\] [dog]
Am I forgetting to format the pattern a certain way for Python? Or does Python interpret regex differently in some way? I have no idea why the only non-empty match would be a double-quote, and I've confirmed that the pattern itself works the way it should.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来一切都在非捕获组内。所以你会得到匹配,只是没有匹配的内容。
It looks like everything is inside a non-capturing group. So you get matches, just no matching content.