PyParsing 中的简单递归下降

发布于 2024-08-04 00:38:55 字数 656 浏览 4 评论 0原文

我尝试使用此代码 并将其转换为我正在从事的编程语言处理项目的内容,但我遇到了简化版本的问题:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

我已经对这个简单的设置进行了许多不同的修改。通常,尝试类似:

print(expr.parseString('1+2'))

将返回['1']。当我陷入深度递归时,我会遇到这样的问题:

print(expr.parseString('(1+2)'))

关于简单递归,我缺少什么,我无法解析任意算术表达式,例如 1+(2 * 3-(4*(5+6) -(7))...

I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

I've played around with a number of different modifications of this simple setup. Usually, trying something like:

print(expr.parseString('1+2'))

Will return ['1']. While I get caught in deep recursion with something like:

print(expr.parseString('(1+2)'))

What am I missing with respect to simple recursion that I can't parse arbitrarily arithmetic expressions, such as 1+(2 * 3-(4*(5+6)-(7))...?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

裸钻 2024-08-11 00:38:55

哇,我想 pyparsing 真的很受欢迎!感谢亚历克斯和约翰介入解决这个问题。你们的回答都切中要害。但让我添加一两条评论:

  1. 如果我们抑制左括号和右括号符号,并使用 Group 对括号表达式进行分组,pyparsing 将得到一个更接近 AST 的结构化结果。

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    def 语法():
        op = oneOf('+ -')
        lpar = Literal( '(' ).suppress()
        rpar = Literal( ')' ).suppress()
        数字=字(数字)
        expr = 转发()
        原子=数字|组(lpar + expr + rpar)
        表达式 <<原子 + ZeroOrMore(操作 + 原子)
        返回表达式
    
    如果 __name__ == "__main__":
        expr = 语法()
        定义测试:
            结果 = expr.parseString(s)
            print s,'->', 结果
    
        测试(“(9 + 3)”)
        测试(“(9 + 3)*(4 / 5)”)
    

    给予:

    <前><代码>(9 + 3) -> [[‘9’、‘+’、‘3’]]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]

    否则,pyparsing 只是标记化,您必须遍历已解析标记的列表来查找嵌套表达式。

  2. 由于 op 被定义为 oneOf("+ - * /"),因此操作没有优先级。 pyparsing 存储库上有示例 https://github.com/pyparsing/pyparsing/ tree/master/examples 手动定义此方法 (fourFn.py),或者使用 infixNotation 帮助程序 (simpleArith.py) 的最新方法。同样,这使 pyparsing 比仅仅标记化添加了更多价值。

对于OP,请查看这些示例,我认为它们将帮助您推进项目。

——保罗

Wow, I guess pyparsing is really on the map! Thanks Alex and John for stepping in on this question. You are both on the mark with your responses. But let me add a comment or two:

  1. If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    def Syntax():
        op = oneOf('+ -')
        lpar  = Literal( '(' ).suppress()
        rpar  = Literal( ')' ).suppress()
        num = Word(nums)
        expr = Forward()
        atom = num | Group(lpar + expr + rpar)
        expr << atom + ZeroOrMore(op + atom)
        return expr
    
    if __name__ == "__main__":
        expr = Syntax()
        def test(s):
            results = expr.parseString(s)
            print s,'->', results
    
        test( "(9 + 3)" )
        test( "(9 + 3) * (4 / 5)" )
    

    Giving:

    (9 + 3) -> [['9', '+', '3']]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
    

    Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.

  2. Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing repo at https://github.com/pyparsing/pyparsing/tree/master/examples of the manual way to define this (fourFn.py), or the more recent approach using the infixNotation helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.

To the OP, please check out those examples, I think they will help move you forward on your project.

-- Paul

树深时见影 2024-08-11 00:38:55

这或多或少是你想要的......?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

发射

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

?这通过将“原子”(数字或括号内的表达式)与“表达式”(一个或多个带有运算符的“原子”)分开来“锚定”递归。

Is this more or less what you want...?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

emitting

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? This "anchors" the recursion by separating an "atom" (number or parenthesized expression) from an "expression" (one or more "atoms" with operators in-between).

¢好甜 2024-08-11 00:38:55

像这样的语法

expr :: expr op expr

很难使用,因为递归总是向左潜入。

正常的算术语法看起来像这样:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

基本上,你永远不会得到 S :: S;每当非终结符出现在语法中一行的左侧和右侧时,中间必须有一些文字供解析器使用。

A grammar like:

expr :: expr op expr

is hard to work with because the recursion just keeps diving into the left.

A normal arithmetic grammar would look something like:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

Basically, you never get S :: S; any time a nonterminal appears on the left and right hand sides of a line in the grammar, there must be some literal in the middle for the parser to consume.

凉月流沐 2024-08-11 00:38:55

使用 operatorPrecedence 构建表达式。它将构建正确的表达式,并在执行时处理运算符优先级:

num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

示例:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]

Use operatorPrecedence to build expressions. It'll build the correct expressions, and take care of operator precedence while at it:

num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

example:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文