Python PLY 解析器

发布于 2024-12-12 00:21:53 字数 896 浏览 0 评论 0原文

我试图四处寻找这个问题的答案，但似乎找不到。我正在尝试使用 PLY 编写 Python 解析器作为一种虚构语言。我的 BNF 的简化版本如下所示：

statement-list -> statement ',' statement-list |
                 'print' expr

statement -> ident 'was' 'a' type |
             ident 'became' expr

type -> 'number' | 'letter'

expr -> factor |
       expr '+' factor |
       expr '-' factor

factor -> number | letter | ident

其中数字和字母就像 int 和 char。

Yacc 文档 (http://www.dabeaz.com/ply/ply.html#ply_nn23 ）仅显示简单算术表达式的语法，其中 p[0] 应该是什么很清楚。

def p_expression_plus(p):
   'expression : expression PLUS term'
    p[0] = p[1] + p[3]

我的问题是我该如何处理 BNF 中的语句列表？我已经：

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'

但我真的不知道接下来该放什么。任何帮助将非常感激！

原文

I've tried to search around for the answer to this question, but can't seem to find one.
I'm trying to write a parser in Python using PLY for a made up language. A simplified version of my BNF looks like this:

statement-list -> statement ',' statement-list |
                 'print' expr

statement -> ident 'was' 'a' type |
             ident 'became' expr

type -> 'number' | 'letter'

expr -> factor |
       expr '+' factor |
       expr '-' factor

factor -> number | letter | ident

where number and letter are like int and char.

The Yacc documentation (http://www.dabeaz.com/ply/ply.html#ply_nn23) only shows the syntax for simple arithmetic expressions where it's clear what p[0] should be.

def p_expression_plus(p):
   'expression : expression PLUS term'
    p[0] = p[1] + p[3]

My question is what do I do for statement-list in my BNF? I've got:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'

but I'm really not sure what to put next.
Any help would be very much appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

耳钉梦 2024-12-19 00:21:53

我不能谈论 PLY 解决方案，但这是一个使用 pyparsing 的解决方案。有时，即使您最终想使用其他库来实现解析器，作为快速而肮脏的原型/练习，pyparsing 示例也可能很有用。不幸的是，这个例子大量使用了operatorPrecedence方法，它隐藏了很多中缀解析的魔力，所以我不知道你能多么容易地翻译它。更传统的 expr/term/factor 解析器示例可以在 pyparsing wiki 的示例页面上找到 (http://pyparsing .wikispaces.com/Examples），标题为 fourFn.py。

bnf = """
statement-list -> statement ',' statement-list

statement -> ident 'was' 'a' type | 
             ident 'became' expr |
             'print' expr |
             'if' conditional-expr statement

type -> 'number' | 'letter' 

expr -> factor | 
       expr '+' factor | 
       expr '-' factor 

factor -> number | letter | ident 
"""

from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence, 
    Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)

PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
    CaselessKeyword,
    "print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])

typeSpecifier = NUMBER | LETTER

number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident

expr = operatorPrecedence(operand,
    [
    ('-', 1, opAssoc.RIGHT),
    (oneOf('* /'), 2, opAssoc.LEFT),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    ])

comparisonExpr = operatorPrecedence(expr,
    [
    ("!", 1, opAssoc.RIGHT),
    (oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
    ])

booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
    [
    (NOT, 1, opAssoc.RIGHT),
    (AND, 2, opAssoc.LEFT),
    (OR, 2, opAssoc.LEFT),
    ])

statement = Forward()
printStmt  = PRINT + expr
wasaStmt   = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)

statementList = delimitedList(statement)

tests = """\
    x was a number
    y became 2+5
    print y
    print 100*(5+2)
    print 100*5+2
    if 5 > y print 1000
    if y < 10 y became y+1, print y
    """.splitlines()[:-1]

for t in tests:
    print t.strip()
    for s in statementList.parseString(t).asList():
        print(s)
    print

打印：

x was a number
['x', 'WAS', 'A', 'NUMBER']

y became 2+5
['y', 'BECAME', ['2', '+', '5']]

print y
['PRINT', 'y']

print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]

print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]

if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]

if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']

我擅自添加 print 作为一种语句类型，以便它可以出现在程序主体中的任何位置。另外，我尝试添加一个 IF-THEN 语句，这确实显示了添加这样的控制流语句如何开始带您走上编写递归语法的道路（不需要仅仅为了递归“是”、“成为”和“打印”）。

I can't speak for a PLY solution, but here is one using pyparsing. Sometimes, a pyparsing example can be useful even if you eventually want to implement your parser using some other library, as a quick-and-dirty prototype/exercise. Unfortunately, this example makes heavy use of the operatorPrecedence method, which buries a lot of the infix parsing magic, so I don't know how easily you will be able to translate it. A more traditional expr/term/factor parser example can be found at the pyparsing wiki on the Examples page (http://pyparsing.wikispaces.com/Examples), titled fourFn.py.

bnf = """
statement-list -> statement ',' statement-list

statement -> ident 'was' 'a' type | 
             ident 'became' expr |
             'print' expr |
             'if' conditional-expr statement

type -> 'number' | 'letter' 

expr -> factor | 
       expr '+' factor | 
       expr '-' factor 

factor -> number | letter | ident 
"""

from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence, 
    Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)

PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
    CaselessKeyword,
    "print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])

typeSpecifier = NUMBER | LETTER

number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident

expr = operatorPrecedence(operand,
    [
    ('-', 1, opAssoc.RIGHT),
    (oneOf('* /'), 2, opAssoc.LEFT),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    ])

comparisonExpr = operatorPrecedence(expr,
    [
    ("!", 1, opAssoc.RIGHT),
    (oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
    ])

booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
    [
    (NOT, 1, opAssoc.RIGHT),
    (AND, 2, opAssoc.LEFT),
    (OR, 2, opAssoc.LEFT),
    ])

statement = Forward()
printStmt  = PRINT + expr
wasaStmt   = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)

statementList = delimitedList(statement)

tests = """\
    x was a number
    y became 2+5
    print y
    print 100*(5+2)
    print 100*5+2
    if 5 > y print 1000
    if y < 10 y became y+1, print y
    """.splitlines()[:-1]

for t in tests:
    print t.strip()
    for s in statementList.parseString(t).asList():
        print(s)
    print

Prints:

x was a number
['x', 'WAS', 'A', 'NUMBER']

y became 2+5
['y', 'BECAME', ['2', '+', '5']]

print y
['PRINT', 'y']

print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]

print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]

if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]

if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']

I took the liberty of adding print as a type of statement so it could appear anywhere in your program body. Also, I tried adding an IF-THEN statement, and this does show how adding such a control-flow statement starts to take you down the path of writing a recursive grammar (didn't need recursion just for 'was a', 'became', and 'print').

回复收藏 0 原文

顾北清歌寒 2024-12-19 00:21:53

这实际上取决于您如何构建代码以及您想要如何评估它。如果您正在评估，只要它以正确的顺序评估，您不希望在 p_statement_list_comma 的文档字符串之后不想要任何东西，即就像您所拥有的那样 -无论如何，语句都会被评估，如果需要，您可以保留变量的全局字典或类似的东西来跟踪某些状态，例如标识符值。

如果你想建立一个解析树，例如，如果你不喜欢 ply 的评估顺序，则单独评估，你可以这样做：

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = [p[1]] + p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = [p[2]]

然后这将为你提供一个语句列表，列表中的最后一个元素是表达。为了简单起见，这里使用列表；如果你愿意的话，你也可以使用你自己的类 - 只需将你想要的任何 python 对象分配给 p[0] ，它就可以用于上面的级别。

如果您希望从 yacc.parse 返回打印表达式的结果（解析树顶层的值将从 yacc.parse 返回），您可以这样做：

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = p[2]

It really depends on how you structure your code and how you want to evaluate it. If you are evaluating as you go along, provided it evaluates in the right order, you don't want you probably don't want anything after the docstring of p_statement_list_comma i.e. just like how you had it - the statements will be evaluated anyway, and if needed you can keep a global dictionary of variables or something similar to keep track of some state such as identifier values.

If you want to build up a parse tree e.g. for evaluation separately if you don't like ply's order of evaluation, you might do something like this:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = [p[1]] + p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = [p[2]]

This would then give you a list of statements, with the last element in the list being an expression. This uses lists for simplicity; you can use your own classes too if you want - just assign whatever python object you want to p[0] and it will be available to the level above.

If you want the result of the print expression being returned from yacc.parse (the value from the top level of the parse tree will be returned from yacc.parse), you might do it like this:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = p[2]

回复收藏 0 原文

~没有更多了~