PyParsing 简单语言表达式
我正在尝试编写一些可以解析某些代码的东西。我能够成功解析 foo(spam)
和 spam+eggs
,但是 foo(spam+eggs)
(递归下降?我的术语来自编译器的有点生疏)失败。
我有以下代码:
from pyparsing_py3 import *
myVal = Word(alphas+nums+'_')
myFunction = myVal + '(' + delimitedList( myVal ) + ')'
myExpr = Forward()
mySubExpr = ( \
myVal \
| (Suppress('(') + Group(myExpr) + Suppress(')')) \
| myFunction \
)
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )
# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))
I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam)
and spam+eggs
, but foo(spam+eggs)
(recursive descent? my terminology from compilers is a bit rusty) fails.
I have the following code:
from pyparsing_py3 import *
myVal = Word(alphas+nums+'_')
myFunction = myVal + '(' + delimitedList( myVal ) + ')'
myExpr = Forward()
mySubExpr = ( \
myVal \
| (Suppress('(') + Group(myExpr) + Suppress(')')) \
| myFunction \
)
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )
# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
几个问题: delimitedList 正在寻找 myVal 的逗号分隔列表,即标识符,作为参数列表的唯一可接受的形式,因此它当然不能匹配 'foo+bar' (不是 myVal 的逗号分隔列表! );修复揭示了另一个问题—— myVal 和 myFunction 以相同的方式启动,因此它们在 mySubExpr 中的顺序很重要;修复揭示了另一个问题——两层嵌套而不是一层。这个版本看起来不错...:
根据需要发出
['blah', ['foo', '+', 'bar']]
。我还删除了多余的反斜杠,因为逻辑行继续无论如何都会发生在括号内;它们是无害的,但确实妨碍了可读性。Several issues: delimitedList is looking for a comma-delimited list of myVal, i.e. identifiers, as the only acceptable form of argument list, so of course it can't match 'foo+bar' (not a comma-delimited list of myVal!); fixing that reveals another -- myVal and myFunction start the same way so their order in mySubExpr matters; fixing that reveals yet another -- TWO levels of nesting instead of one. This versions seems ok...:
emits
['blah', ['foo', '+', 'bar']]
as desired. I also removed the redundant backslashes, since logical line continuation occurs anyway within parentheses; they were innocuous but did hamper readability.我发现使用“<<”时要养成一个好习惯与 Forwards 的运算符总是将 RHS 括在括号中。即:
更好的是:
这是我不幸选择 '<<' 的结果作为“插入”运算符,用于将表达式插入到转发中。在这种特殊情况下,括号是不必要的,但在这种情况下:
我们明白为什么我说“不幸”。如果我将其简化为“A << B | C”,我们很容易看到运算的优先级导致评估被执行为“(A << B) | C”,因为“<<”优先级高于“|”。结果是Forward A只得到了插入其中的表达式B。 “| C”部分确实被执行,但发生的情况是您得到“A | C”,它创建了一个 MatchFirst 对象,然后该对象立即被丢弃,因为它没有分配给任何变量名。解决方案是将括号内的语句分组为“A << (B | C)”。在仅使用“+”运算组成的表达式中,实际上不需要括号,因为“+”的优先级高于“<<”。但这只是幸运的编码,当有人后来使用“|”添加替代表达式时会导致问题并且没有意识到优先级的含义。因此,我建议采用“A <<(表达式)”样式来帮助避免这种混乱。
(有一天我会编写 pyparsing 2.0 - 这将允许我打破与现有代码的兼容性 - 并将其更改为使用 '<<=' 运算符,它修复了所有这些优先级问题,因为 '<<='优先级低于 pyparsing 使用的任何其他运算符。)
I've found that a good habit to get into when using the '<<' operator with Forwards is to always enclose the RHS in parentheses. That is:
is better as:
This is a result of my unfortunate choice of '<<' as the "insertion" operator for inserting the expression into a Forward. The parentheses are unnecessary in this particular case, but in this one:
we see why I say "unfortunate". If I simplify this to "A << B | C", we easily see that the precedence of operations causes evaluation to be performed as "(A << B) | C", since '<<' has higher precedence than '|'. The result is that the Forward A only gets the expression B inserted in it. The "| C" part does get executed, but what happens is that you get "A | C" which creates a MatchFirst object, which is then immediately discarded since it is not assigned to any variable name. The solution would be to group the statement within parentheses as "A << (B | C)". In expressions composed only using '+' operations, there is no actual need for the parentheses, since '+' has a higher precedence than '<<'. But this is just lucky coding, and causes problem when someone later adds an alternative expression using '|' and doesn't realize the precedence implications. So I suggest just adopting the style "A << (expression)" to help avoid this confusion.
(Someday I will write pyparsing 2.0 - which will allow me to break compatibilty with existing code - and change this to use the '<<=' operator, which fixes all of these precedence issues, since '<<=' has lower precedence than any of the other operators used by pyparsing.)