在 pyparsing 中进行简单解析时遇到问题

发布于 2024-12-17 13:11:48 字数 818 浏览 2 评论 0原文

我在使用 pyparsing 时遇到一些基本问题。下面是测试程序和运行的输出。

aaron-mac:sql aaron$ more s.py

from pyparsing import *

n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")


aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
 File "s.py", line 15, in <module>
   print p.parseString("first,second.")
 File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
   raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$ 

如何修改测试程序中的语法以解析以句点结尾的逗号分隔名称列表?我查看了文档并尝试找到实时支持列表,但认为我最有可能在这里得到回复。

I'm having some basic problem using pyparsing. Below is the test program and the output of the run.

aaron-mac:sql aaron$ more s.py

from pyparsing import *

n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")


aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
 File "s.py", line 15, in <module>
   print p.parseString("first,second.")
 File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
   raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$ 

How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

感情废物 2024-12-24 13:11:48

'|'运算符创建一个 MatchFirst 表达式,在其中评估替代项,直到出现第一个匹配项。

Pyparsing 纯粹从左到右工作,尽可能将解析器表达式应用于输入字符串。 pyparsing 所做的唯一前瞻是您写入解析器的任何内容。

在这个表达式中:

a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))

假设 n 只是一个文字“X”。如果这个解析器被给予输入字符串“X”,它显然会匹配前导的、单独的n表达式。如果给定字符串“X,X,X”,它仍然会仅匹配前导的 n,因为这是解析器中的第一个替代项。

如果将表达式转变为:

a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)

then 来解析“X”,它将首先尝试匹配列表,这将失败,然后匹配单独的n。要解析“X,X,X”,第一个替代方案将是列表表达式,它将匹配。

如果您想要匹配最长的替代项,请使用“^”运算符,它给出 Or 表达式。或者将评估所有给定的替代方案,然后选择最长的匹配。

a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))

您还可以使用 pyparsing 辅助方法 delimitedList 来简化此过程。解析由逗号分隔的相同表达式的列表是一种常见情况,因此,我添加了 expr + ZeroOrMore(Suppress(",") + expr) 一遍又一遍地重新发明 expr + ZeroOrMore(Suppress(",") + expr) >delimitedList 作为标准 pyparsing 助手。 delimitedList("X") 将匹配“X”和“X,X,X”。

The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.

Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.

In this expression:

a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))

Let's say n is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lone n expression. If given the string "X,X,X", it would still match just the leading n, because that is the first alternative in the parser.

If you turn the expression around to:

a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)

then to parse "X" it would first try to match the list, which will fail, and then match the lone n. To parse "X,X,X", the first alternative will be the list expression, which will match.

If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.

a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))

You can also simplify this using the pyparsing helper method delimitedList. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinvent expr + ZeroOrMore(Suppress(",") + expr) over and over, I added delimitedList as a standard pyparsing helper. delimitedList("X") would match both "X" and "X,X,X".

随心而道 2024-12-24 13:11:48

如果您只想涵盖以句点结尾的逗号分隔名称列表的情况,您可以使用以下命令:

from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")

通过此操作,您将得到以下结果:

>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']

您问题中的其他示例失败,因为它们不以句点结尾。

If you just want to cover the case of a comma separated list of names terminated by period you can use the following:

from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")

With this you get the following results:

>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']

The other examples in your question fail because they don't end with a period.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文