PyParsing 简单语言表达式

发布于 2024-08-04 06:32:07 字数 626 浏览 8 评论 0原文

我正在尝试编写一些可以解析某些代码的东西。我能够成功解析 foo(spam) 和 spam+eggs，但是 foo(spam+eggs) （递归下降？我的术语来自编译器的有点生疏）失败。

我有以下代码：

from pyparsing_py3 import *

myVal = Word(alphas+nums+'_')    
myFunction = myVal + '(' + delimitedList( myVal ) + ')'

myExpr = Forward()
mySubExpr = ( \
    myVal \
    | (Suppress('(') + Group(myExpr) + Suppress(')')) \
    | myFunction \
    )
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )


# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))

原文

I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam) and spam+eggs, but foo(spam+eggs) (recursive descent? my terminology from compilers is a bit rusty) fails.

I have the following code:

from pyparsing_py3 import *

myVal = Word(alphas+nums+'_')    
myFunction = myVal + '(' + delimitedList( myVal ) + ')'

myExpr = Forward()
mySubExpr = ( \
    myVal \
    | (Suppress('(') + Group(myExpr) + Suppress(')')) \
    | myFunction \
    )
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )


# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短暂陪伴 2024-08-11 06:32:07

几个问题： delimitedList 正在寻找 myVal 的逗号分隔列表，即标识符，作为参数列表的唯一可接受的形式，因此它当然不能匹配 'foo+bar' （不是 myVal 的逗号分隔列表！）；修复揭示了另一个问题—— myVal 和 myFunction 以相同的方式启动，因此它们在 mySubExpr 中的顺序很重要；修复揭示了另一个问题——两层嵌套而不是一层。这个版本看起来不错...：

myVal = Word(alphas+nums+'_')    

myExpr = Forward()
mySubExpr = (
    (Suppress('(') + Group(myExpr) + Suppress(')'))
    | myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
    | myVal
    )
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) 

print(myExpr.parseString('blah(foo+bar)'))

根据需要发出 ['blah', ['foo', '+', 'bar']] 。我还删除了多余的反斜杠，因为逻辑行继续无论如何都会发生在括号内；它们是无害的，但确实妨碍了可读性。

Several issues: delimitedList is looking for a comma-delimited list of myVal, i.e. identifiers, as the only acceptable form of argument list, so of course it can't match 'foo+bar' (not a comma-delimited list of myVal!); fixing that reveals another -- myVal and myFunction start the same way so their order in mySubExpr matters; fixing that reveals yet another -- TWO levels of nesting instead of one. This versions seems ok...:

myVal = Word(alphas+nums+'_')    

myExpr = Forward()
mySubExpr = (
    (Suppress('(') + Group(myExpr) + Suppress(')'))
    | myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
    | myVal
    )
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) 

print(myExpr.parseString('blah(foo+bar)'))

emits ['blah', ['foo', '+', 'bar']] as desired. I also removed the redundant backslashes, since logical line continuation occurs anyway within parentheses; they were innocuous but did hamper readability.

回复收藏 0 原文

压抑⊿情绪 2024-08-11 06:32:07

我发现使用“<<”时要养成一个好习惯与 Forwards 的运算符总是将 RHS 括在括号中。即：

myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )

更好的是：

myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )

这是我不幸选择 '<<' 的结果作为“插入”运算符，用于将表达式插入到转发中。在这种特殊情况下，括号是不必要的，但在这种情况下：

integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer

（有一天我会编写 pyparsing 2.0 - 这将允许我打破与现有代码的兼容性 - 并将其更改为使用 '<<=' 运算符，它修复了所有这些优先级问题，因为 '<<='优先级低于 pyparsing 使用的任何其他运算符。）

I've found that a good habit to get into when using the '<<' operator with Forwards is to always enclose the RHS in parentheses. That is:

myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )

is better as:

myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )

This is a result of my unfortunate choice of '<<' as the "insertion" operator for inserting the expression into a Forward. The parentheses are unnecessary in this particular case, but in this one:

integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer

we see why I say "unfortunate". If I simplify this to "A << B | C", we easily see that the precedence of operations causes evaluation to be performed as "(A << B) | C", since '<<' has higher precedence than '|'. The result is that the Forward A only gets the expression B inserted in it. The "| C" part does get executed, but what happens is that you get "A | C" which creates a MatchFirst object, which is then immediately discarded since it is not assigned to any variable name. The solution would be to group the statement within parentheses as "A << (B | C)". In expressions composed only using '+' operations, there is no actual need for the parentheses, since '+' has a higher precedence than '<<'. But this is just lucky coding, and causes problem when someone later adds an alternative expression using '|' and doesn't realize the precedence implications. So I suggest just adopting the style "A << (expression)" to help avoid this confusion.

(Someday I will write pyparsing 2.0 - which will allow me to break compatibilty with existing code - and change this to use the '<<=' operator, which fixes all of these precedence issues, since '<<=' has lower precedence than any of the other operators used by pyparsing.)

回复收藏 0 原文

~没有更多了~