我最近使用 Ply 用 Python 编写了一个解析器(它是 yacc 的 python 重新实现)。 当我几乎完成解析器时,我发现我需要解析的语法要求我在解析期间进行一些查找以通知词法分析器。 如果不进行查找来通知词法分析器,我就无法正确解析该语言中的字符串。
鉴于我可以根据语法规则控制词法分析器的状态,我想我将使用解析器模块中的查找表来解决我的用例,但它可能变得太难以维护/测试。 所以我想了解一些其他选择。
在 Haskell 中,我会使用 Parsec,一个解析函数库(称为组合器)。 有 Parsec 的 Python 实现吗? 或者可能是其他一些具有解析功能的生产质量库,以便我可以在 Python 中构建上下文敏感的解析器?
编辑:我所有上下文无关解析的尝试都失败了。 因此,我不认为 ANTLR 在这里有用。
I recently wrote a parser in Python using Ply (it's a python reimplementation of yacc). When I was almost done with the parser I discovered that the grammar I need to parse requires me to do some look up during parsing to inform the lexer. Without doing a look up to inform the lexer I cannot correctly parse the strings in the language.
Given than I can control the state of the lexer from the grammar rules I think I'll be solving my use case using a look up table in the parser module, but it may become too difficult to maintain/test. So I want to know about some of the other options.
In Haskell I would use Parsec, a library of parsing functions (known as combinators). Is there a Python implementation of Parsec? Or perhaps some other production quality library full of parsing functionality so I can build a context sensitive parser in Python?
EDIT: All my attempts at context free parsing have failed. For this reason, I don't expect ANTLR to be useful here.
发布评论
评论(5)
PySec 是另一个单子解析器,我对它了解不多,但值得一看 这里
PySec is another monadic parser, I don't know much about it, but it's worth looking at here
我相信 pyparsing 与秒差距基于相同的原理。
I believe that pyparsing is based on the same principles as parsec.
有 ANTLR,即 LL(*),有 PyParsing,它对对象更加友好,有点像 DSL,然后还有 解析,就像 OCaml 的 Menhir。
There's ANTLR, which is LL(*), there's PyParsing, which is more object friendly and is sort of like a DSL, and then there's Parsing which is like OCaml's Menhir.
没有什么可以阻止您使用 PLY 将解析器从“上下文无关”路径转移。 您可以在解析过程中将信息传递给词法分析器,从而实现充分的灵活性。 我很确定你可以用 PLY 这种方式解析任何你想要的东西。
对于动手示例,考虑 - 它是用 Python 编写的 ANSI C 解析器与 PLY。 它通过在解析器中填充符号表来解决经典的 C typedef - 标识符问题(这使得 C 的语法不上下文相关),该符号表在词法分析器中用于将符号名称解析为类型或非类型。
Nothing prevents you for diverting your parser from the "context free" path using PLY. You can pass information to the lexer during parsing, and in this way achieve full flexibility. I'm pretty sure that you can parse anything you want with PLY this way.
For a hands-on example, consider - it is a parser for ANSI C written in Python with PLY. It solves the classic C typedef - identifier problem (that makes C's grammar non context-sensitive) by populating a symbol table in the parser that is being used in the lexer to resolve symbol names as either types or not.
如果 LL 解析器适合您,您可以考虑的一个选项是提供 ANTLR 尝试一下,它也可以生成python(实际上它是LL(*),正如他们的名字一样,*代表它可以处理的lookahead数量)。
An option you may consider, if an LL parser is ok to you, is to give ANTLR a try, it can generate python too (actually it is LL(*) as they name it, * stands for the quantity of lookahead it can cope with).