在 pyparsing 中进行简单解析时遇到问题
我在使用 pyparsing 时遇到一些基本问题。下面是测试程序和运行的输出。
aaron-mac:sql aaron$ more s.py
from pyparsing import *
n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")
aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
File "s.py", line 15, in <module>
print p.parseString("first,second.")
File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$
如何修改测试程序中的语法以解析以句点结尾的逗号分隔名称列表?我查看了文档并尝试找到实时支持列表,但认为我最有可能在这里得到回复。
I'm having some basic problem using pyparsing. Below is the test program and the output of the run.
aaron-mac:sql aaron$ more s.py
from pyparsing import *
n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")
aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
File "s.py", line 15, in <module>
print p.parseString("first,second.")
File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$
How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
'|'运算符创建一个 MatchFirst 表达式,在其中评估替代项,直到出现第一个匹配项。
Pyparsing 纯粹从左到右工作,尽可能将解析器表达式应用于输入字符串。 pyparsing 所做的唯一前瞻是您写入解析器的任何内容。
在这个表达式中:
假设
n
只是一个文字“X”。如果这个解析器被给予输入字符串“X”,它显然会匹配前导的、单独的n
表达式。如果给定字符串“X,X,X”,它仍然会仅匹配前导的 n,因为这是解析器中的第一个替代项。如果将表达式转变为:
then 来解析“X”,它将首先尝试匹配列表,这将失败,然后匹配单独的
n
。要解析“X,X,X”,第一个替代方案将是列表表达式,它将匹配。如果您想要匹配最长的替代项,请使用“^”运算符,它给出 Or 表达式。或者将评估所有给定的替代方案,然后选择最长的匹配。
您还可以使用 pyparsing 辅助方法
delimitedList
来简化此过程。解析由逗号分隔的相同表达式的列表是一种常见情况,因此,我添加了expr + ZeroOrMore(Suppress(",") + expr) 一遍又一遍地重新发明
作为标准 pyparsing 助手。expr + ZeroOrMore(Suppress(",") + expr)
>delimitedListdelimitedList("X")
将匹配“X”和“X,X,X”。The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.
Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.
In this expression:
Let's say
n
is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lonen
expression. If given the string "X,X,X", it would still match just the leadingn
, because that is the first alternative in the parser.If you turn the expression around to:
then to parse "X" it would first try to match the list, which will fail, and then match the lone
n
. To parse "X,X,X", the first alternative will be the list expression, which will match.If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.
You can also simplify this using the pyparsing helper method
delimitedList
. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinventexpr + ZeroOrMore(Suppress(",") + expr)
over and over, I addeddelimitedList
as a standard pyparsing helper.delimitedList("X")
would match both "X" and "X,X,X".如果您只想涵盖以句点结尾的逗号分隔名称列表的情况,您可以使用以下命令:
通过此操作,您将得到以下结果:
您问题中的其他示例失败,因为它们不以句点结尾。
If you just want to cover the case of a comma separated list of names terminated by period you can use the following:
With this you get the following results:
The other examples in your question fail because they don't end with a period.