将多行 make-line 变量赋值与 python 正则表达式相匹配
我试图从多行 make-line 变量赋值中提取多行值。下面的测试用例无法在输入字符串中找到匹配项,我不得不承认我不明白为什么。非常欢迎帮助使此示例代码在标准输出上打印“a \ b”。
#!/usr/bin/env python
def test():
s = r"""
FOO=a \
b
"""
import re
print type(s),s
regex = re.compile(r'^FOO=(.+)(?<!\\)$', re.M)
m = regex.search(s)
print m.group(1)
if __name__ == '__main__':
test()
I am trying to extract from a multiline make-line variable assignment the multiline value. The following testcase fails to find a match in the input string and I have to confess that I fail to see why. Help on making this sample code print "a \ b" on stdout would be most welcome.
#!/usr/bin/env python
def test():
s = r"""
FOO=a \
b
"""
import re
print type(s),s
regex = re.compile(r'^FOO=(.+)(?<!\\)
, re.M)
m = regex.search(s)
print m.group(1)
if __name__ == '__main__':
test()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
re.M 表示 re.MULTILINE,但它不关心点的象征意义,它关心 ^ 和 $ 的象征意义。
您需要指定 re.DOTALL 以使点能够匹配,即使是 '\n'
结果
re.M means re.MULTILINE, but it doesn't concern the symbolism of dot, it concerns the symbolism of ^ and $
You need to specify re.DOTALL to make the dot able to match even with '\n'
result
您的问题是
.
默认情况下与换行符不匹配。如果启用 Dotall 修改器,它将起作用。您可以使用
re.S
执行此操作,然后您的输出将是
您的模式确实与包括换行符的模式匹配。
我不确定您想使用多行修饰符
re.M
实现什么目的。它使^
和$
匹配行的开始/结束。我想你可以删除它。我也不确定你想通过你的负面lookbehind实现什么
(?,我认为你应该澄清你的预期输出。 (是否要删除 a \ b 中的换行符?)
Your problem is that the
.
doesn't match a newline character by default. If you enable the Dotall modifier it will work.You do so using
re.S
Your output will then be
Your pattern does just match the pattern including the linebreaks.
I am not sure what you want to achieve with the multi line modifier
re.M
. It makes the^
and the$
match a row start/end. I assume you can remove it.I am also not sure what you want to achieve with your negative lookbehind
(?<!\\)
, I think you should clarify your expected output. (Do you want to remove the newlines in a \ b?)我想出了这个:
它假设反斜杠后面没有空格。
I came up with this one:
it assumes there are no whitespaces behind the backslash.
您的示例文本中包含大量空格字符,包括反斜杠之后的空格字符。我认为这不是您想要的,因为反斜杠的目的是转义通常标记条目结尾的换行符。
但反斜杠也可用于转义其他字符,包括反斜杠。如果一个值恰好以反斜杠结尾,它将在 makefile 中显示为两个反斜杠。正则表达式中的lookbehind将“看到”第二个,并错误地将其视为行延续的一部分。
如果您正在考虑添加另一个lookbehind 来查看反斜杠是否被转义,让我现在阻止您。这已经被讨论过很多次了,而后瞻方法无法发挥作用。您想要的是这样的:
在 ideone 上查看它的实际操作
第一个
[^\n\\]*
消耗尽可能多的非换行、非反斜杠字符,然后将控制权交给下一部分。如果尚未到达字符串末尾,它会尝试匹配反斜杠后跟任何字符(包括换行符,感谢re.S
修饰符),后面跟着我的一些更“正常”的字符。它像这样在循环中继续,直到(假设输入有效)它遇到未转义的换行符或输入末尾。虽然是
re.S
修饰符让点匹配换行符,但re.M
修饰符也是需要的;正如 @stema 所解释的那样,它让^
匹配行的开头,而$
匹配行的结尾。Your sample text has a whole lot of space characters in it, including after the backslash. I assume that's not what you intended, since the point of the backslash is to escape the linefeed that would normally mark the end of the entry.
But backslashes can be used to escape other characters as well, including backslashes. If an value happens to end with a backslash, it will show up as two backslashes in the makefile. The lookbehind in your regex will "see" the second one, and incorrectly treat it as part of a line continuation.
If you're thinking of adding another lookbehind to see if the backslash is escaped, let me stop you now. This has been hashed out many times, and the lookbehind approach can't be made to work. What you want is something like this:
See it in action on ideone
The first
[^\n\\]*
consumes as many non-linefeed, non-backslash characters as it can, then hands control to the next part. If the end of the string hasn't been reached, it tries to match a backslash followed by any character (including linefeeds, thanks to there.S
modifier) followed my some more "normal" characters. It continues like that in a loop until (assuming the input is valid) it runs into an unescaped linefeed or the end of the input.Although it's the
re.S
modifier that lets the dot match newlines, there.M
modifier is needed too; it's what lets^
match the beginning of a line and$
match the end of a line, as @stema explained.