PyParsing 简单语言表达式

发布于 2024-08-04 06:32:07 字数 626 浏览 8 评论 0原文

我正在尝试编写一些可以解析某些代码的东西。我能够成功解析 foo(spam)spam+eggs,但是 foo(spam+eggs) (递归下降?我的术语来自编译器的有点生疏)失败。

我有以下代码:

from pyparsing_py3 import *

myVal = Word(alphas+nums+'_')    
myFunction = myVal + '(' + delimitedList( myVal ) + ')'

myExpr = Forward()
mySubExpr = ( \
    myVal \
    | (Suppress('(') + Group(myExpr) + Suppress(')')) \
    | myFunction \
    )
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )


# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))

I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam) and spam+eggs, but foo(spam+eggs) (recursive descent? my terminology from compilers is a bit rusty) fails.

I have the following code:

from pyparsing_py3 import *

myVal = Word(alphas+nums+'_')    
myFunction = myVal + '(' + delimitedList( myVal ) + ')'

myExpr = Forward()
mySubExpr = ( \
    myVal \
    | (Suppress('(') + Group(myExpr) + Suppress(')')) \
    | myFunction \
    )
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )


# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

短暂陪伴 2024-08-11 06:32:07

几个问题: delimitedList 正在寻找 myVal 的逗号分隔列表,即标识符,作为参数列表的唯一可接受的形式,因此它当然不能匹配 'foo+bar' (不是 myVal 的逗号分隔列表! );修复揭示了另一个问题—— myVal 和 myFunction 以相同的方式启动,因此它们在 mySubExpr 中的顺序很重要;修复揭示了另一个问题——两层嵌套而不是一层。这个版本看起来不错...:

myVal = Word(alphas+nums+'_')    

myExpr = Forward()
mySubExpr = (
    (Suppress('(') + Group(myExpr) + Suppress(')'))
    | myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
    | myVal
    )
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) 

print(myExpr.parseString('blah(foo+bar)'))

根据需要发出 ['blah', ['foo', '+', 'bar']] 。我还删除了多余的反斜杠,因为逻辑行继续无论如何都会发生在括号内;它们是无害的,但确实妨碍了可读性。

Several issues: delimitedList is looking for a comma-delimited list of myVal, i.e. identifiers, as the only acceptable form of argument list, so of course it can't match 'foo+bar' (not a comma-delimited list of myVal!); fixing that reveals another -- myVal and myFunction start the same way so their order in mySubExpr matters; fixing that reveals yet another -- TWO levels of nesting instead of one. This versions seems ok...:

myVal = Word(alphas+nums+'_')    

myExpr = Forward()
mySubExpr = (
    (Suppress('(') + Group(myExpr) + Suppress(')'))
    | myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
    | myVal
    )
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) 

print(myExpr.parseString('blah(foo+bar)'))

emits ['blah', ['foo', '+', 'bar']] as desired. I also removed the redundant backslashes, since logical line continuation occurs anyway within parentheses; they were innocuous but did hamper readability.

压抑⊿情绪 2024-08-11 06:32:07

我发现使用“<<”时要养成一个好习惯与 Forwards 的运算符总是将 RHS 括在括号中。即:

myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )

更好的是:

myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )

这是我不幸选择 '<<' 的结果作为“插入”运算符,用于将表达式插入到转发中。在这种特殊情况下,括号是不必要的,但在这种情况下:

integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer

我们明白为什么我说“不幸”。如果我将其简化为“A << B | C”,我们很容易看到运算的优先级导致评估被执行为“(A << B) | C”,因为“<<”优先级高于“|”。结果是Forward A只得到了插入其中的表达式B。 “| C”部分确实被执行,但发生的情况是您得到“A | C”,它创建了一个 MatchFirst 对象,然后该对象立即被丢弃,因为它没有分配给任何变量名。解决方案是将括号内的语句分组为“A << (B | C)”。在仅使用“+”运算组成的表达式中,实际上不需要括号,因为“+”的优先级高于“<<”。但这只是幸运的编码,当有人后来使用“|”添加替代表达式时会导致问题并且没有意识到优先级的含义。因此,我建议采用“A <<(表达式)”样式来帮助避免这种混乱。

(有一天我会编写 pyparsing 2.0 - 这将允许我打破与现有代码的兼容性 - 并将其更改为使用 '<<=' 运算符,它修复了所有这些优先级问题,因为 '<<='优先级低于 pyparsing 使用的任何其他运算符。)

I've found that a good habit to get into when using the '<<' operator with Forwards is to always enclose the RHS in parentheses. That is:

myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )

is better as:

myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )

This is a result of my unfortunate choice of '<<' as the "insertion" operator for inserting the expression into a Forward. The parentheses are unnecessary in this particular case, but in this one:

integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer

we see why I say "unfortunate". If I simplify this to "A << B | C", we easily see that the precedence of operations causes evaluation to be performed as "(A << B) | C", since '<<' has higher precedence than '|'. The result is that the Forward A only gets the expression B inserted in it. The "| C" part does get executed, but what happens is that you get "A | C" which creates a MatchFirst object, which is then immediately discarded since it is not assigned to any variable name. The solution would be to group the statement within parentheses as "A << (B | C)". In expressions composed only using '+' operations, there is no actual need for the parentheses, since '+' has a higher precedence than '<<'. But this is just lucky coding, and causes problem when someone later adds an alternative expression using '|' and doesn't realize the precedence implications. So I suggest just adopting the style "A << (expression)" to help avoid this confusion.

(Someday I will write pyparsing 2.0 - which will allow me to break compatibilty with existing code - and change this to use the '<<=' operator, which fixes all of these precedence issues, since '<<=' has lower precedence than any of the other operators used by pyparsing.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文