使用 pyparsing 匹配行开头的空格

发布于 2024-10-03 03:21:09 字数 482 浏览 2 评论 0原文

我正在尝试使用 pyparsing 解析统一的差异文件作为练习，但我无法得到正确的结果。这是我的 diff 文件中给我带来麻烦的部分：

(... some stuff over...)
 banana
+apple
 orange

第一行以“”开头，然后是“banana”。我有以下用于解析行的表达式：

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

这在解析单行时有效，但是当我尝试解析整个文件时，“leaveWhitespace”指令使解析器从最后一行的末尾开始。在我的示例中，解析“香蕉”后，下一个字符是“\n”（因为leaveWhitespace），并且解析器尝试匹配“”或“+”或“-”，因此抛出错误。

我该如何正确处理这个问题？

原文

I'm trying to parse a unified diff file using pyparsing as an exercise and I can't get something right. Here the part of my diff file that's causing me troubles :

(... some stuff over...)
 banana
+apple
 orange

The first line starts with " " then "banana". I have the following expression for parsing a line :

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

This works when parsing a single line, but when I try to parse the whole file, the "leaveWhitespace" instruction make the parser start at the end of the last line. In my example, after parsing " banana", the next char is "\n" (because of leaveWhitespace) and the parser tries to match " " or "+" or "-" and so throws an error.

How can I handle this correctly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮云落日 2024-10-10 03:21:09

您可以一次读取并解析一行。以下代码对我有用。

from pyparsing import Literal, restOfLine

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

f = open("/tmp/test.diff")
for l in f.readlines():
  fields = line.parseString(l)
  print fields

输出是

[' ', 'banana']
['+', 'apple']
[' ', 'orange']

或者如果你必须解析几行，你可以显式指定 LineEnd

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine + LineEnd()
lines = ZeroOrMore(line)
lines.parseString(f.read())

You can read and parse one line at a time. The following code works for me.

from pyparsing import Literal, restOfLine

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

f = open("/tmp/test.diff")
for l in f.readlines():
  fields = line.parseString(l)
  print fields

And the output is

[' ', 'banana']
['+', 'apple']
[' ', 'orange']

Or if you have to parse several lines, you can explicitly specify the LineEnd

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine + LineEnd()
lines = ZeroOrMore(line)
lines.parseString(f.read())

回复收藏 0 原文

~没有更多了~