Python 中的词法分析、标记化和解析资源

发布于 2024-07-04 04:22:22 字数 779 浏览 8 评论 0 原文

人们可以向我指出有关使用 Python 进行词法分析、解析和标记化的资源吗?

我正在对一个开源项目 (hotwire) 进行一些修改,并想做一些更改词法的代码、解析并标记输入的命令进去。 由于它是真正的工作代码,因此相当复杂并且有点难以计算。

我之前没有研究过 lex/parse/tokenise 的代码,所以我认为一种方法是完成一两个关于这方面的教程。 我希望学到足够的知识来浏览我真正想要更改的代码。 那里有什么合适的吗? (理想情况下,它可以在一个下午完成,而不必先购买和阅读龙书...)

编辑:(2008 年 10 月 7 日)以下答案都没有给出我想要的。 有了它们,我可以从头开始生成解析器,但我想学习如何从头开始编写我自己的基本解析器,而不是使用 lex 和 yacc 或类似工具。 完成此操作后,我可以更好地理解现有代码。

那么有人可以给我指一个教程,让我可以只使用 python 从头开始​​构建一个基本的解析器吗?

Can people point me to resources on lexing, parsing and tokenising with Python?

I'm doing a little hacking on an open source project (hotwire) and wanted to do a few changes to the code that lexes, parses and tokenises the commands entered into it. As it is real working code it is fairly complex and a bit hard to work out.

I haven't worked on code to lex/parse/tokenise before, so I was thinking one approach would be to work through a tutorial or two on this aspect. I would hope to learn enough to navigate around the code I actually want to alter. Is there anything suitable out there? (Ideally it could be done in an afternoon without having to buy and read the dragon book first ...)

Edit: (7 Oct 2008) None of the below answers quite give what I want. With them I could generate parsers from scratch, but I want to learn how to write my own basic parser from scratch, not using lex and yacc or similar tools. Having done that I can then understand the existing code better.

So could someone point me to a tutorial where I can build a basic parser from scratch, using just python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

沦落红尘 2024-07-11 04:22:22

我建议 http://www.canonware.com/Parsing/,因为它是纯 python 并且您不需要学习语法,但它的使用并不广泛,并且文档相对较少。 重量级的是ANTLR和PyParsing。 ANTLR 也可以生成 java 和 C++ 解析器,以及 AST walkers,但您必须学习什么相当于一门新语言。

I suggest http://www.canonware.com/Parsing/, since it is pure python and you don't need to learn a grammar, but it isn't widely used, and has comparatively little documentation. The heavyweight is ANTLR and PyParsing. ANTLR can generate java and C++ parsers too, and AST walkers but you will have to learn what amounts to a new language.

任谁 2024-07-11 04:22:22

查看标准模块 shlex 并修改它的一个副本以匹配您用于 shell 的语法,这是一个很好的起点

如果您想要词法分析/解析的完整解决方案的所有功能,ANTLR 也可以生成 python。

Have a look at the standard module shlex and modify one copy of it to match the syntax you use for your shell, it is a good starting point

If you want all the power of a complete solution for lexing/parsing, ANTLR can generate python too.

残疾 2024-07-11 04:22:22

Frederico Tomassetti 对从 BNF 到二进制解密相关的所有内容进行了很好(但简短)的简洁写作:

  • 词法、
  • 解析器、
  • 抽象语法树 (AST) 和
  • 构造/代码生成器。

他甚至提到了新的解析表达式语法(PEG)。

https://tomassetti.me/parsing-in-python/

Frederico Tomassetti had a good (but short) concise write-up to all things related from BNF to binary deciphering on:

  • lexical,
  • parser,
  • abstract-syntax tree (AST), and
  • Construct/code-generator.

He even mentioned the new Parsing Expression Grammar (PEG).

https://tomassetti.me/parsing-in-python/

多孤肩上扛 2024-07-11 04:22:22

pygments 是一个用 python 编写的源代码语法高亮器。 它有词法分析器和格式化程序,查看源代码可能会很有趣。

pygments is a source code syntax highlighter written in python. It has lexers and formatters, and may be interesting to peek at the source.

辞取 2024-07-11 04:22:22

这里有一些可以帮助您入门的东西(大致从最简单到最复杂,从最不强大到最强大):

http://en.wikipedia.org/wiki/Recursive_descent_parser

< a href="http://en.wikipedia.org/wiki/Top-down_parsing" rel="noreferrer">http://en.wikipedia.org/wiki/Top-down_parsing

http://en.wikipedia.org/wiki/LL_parser

http://effbot.org/zone/simple-top-down-parsing.htm

http://en. wikipedia.org/wiki/Bottom-up_parsing

http://en.wikipedia.org/wiki/LR_parser

http://en.wikipedia.org/wiki/ GLR_parser

当我学习这些东西时,是在一个学期的 400 级大学课程中。 我们做了很多手工解析的作业; 如果您想真正了解幕后发生的事情,我建议您采用相同的方法。

这不是我用的书,但相当不错:编译器设计原理

希望这足以让您开始:)

Here's a few things to get you started (roughly from simplest-to-most-complex, least-to-most-powerful):

http://en.wikipedia.org/wiki/Recursive_descent_parser

http://en.wikipedia.org/wiki/Top-down_parsing

http://en.wikipedia.org/wiki/LL_parser

http://effbot.org/zone/simple-top-down-parsing.htm

http://en.wikipedia.org/wiki/Bottom-up_parsing

http://en.wikipedia.org/wiki/LR_parser

http://en.wikipedia.org/wiki/GLR_parser

When I learned this stuff, it was in a semester-long 400-level university course. We did a number of assignments where we did parsing by hand; if you want to really understand what's going on under the hood, I'd recommend the same approach.

This isn't the book I used, but it's pretty good: Principles of Compiler Design.

Hopefully that's enough to get you started :)

流殇 2024-07-11 04:22:22

这个问题已经很老了,但也许我的回答会对那些想学习基础知识的人有所帮助。 我觉得这个资源非常好。 它是一个用 python 编写的简单解释器,无需使用任何外部库。 因此,这将帮助任何想要了解解析、词法分析和标记化内部工作的人:

“A Simple Intepreter from Scratch in Python:” 第 1 部分第 2 部分,
第 3 部分,以及 第 4 部分

This question is pretty old, but maybe my answer would help someone who wants to learn the basics. I find this resource to be very good. It is a simple interpreter written in python without the use of any external libraries. So this will help anyone who would like to understand the internal working of parsing, lexing, and tokenising:

"A Simple Intepreter from Scratch in Python:" Part 1, Part 2,
Part 3, and Part 4.

梦幻的心爱 2024-07-11 04:22:22

对于中等复杂的语法, PyParsing 非常出色。 您可以直接在 Python 代码中定义语法,无需生成代码:(

>>> from pyparsing import Word, alphas
>>> greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
>>> hello = "Hello, World!"
>>>> print hello, "->", greet.parseString( hello )
Hello, World! -> ['Hello', ',', 'World', '!']

示例取自 PyParsing 主页)。

通过解析操作(触发特定语法规则时调用的函数),您可以将解析直接转换为抽象语法树或任何其他表示形式。

有许多辅助函数封装了重复出现的模式,例如运算符层次结构、带引号的字符串、嵌套或 C 风格注释。

For medium-complex grammars, PyParsing is brilliant. You can define grammars directly within Python code, no need for code generation:

>>> from pyparsing import Word, alphas
>>> greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
>>> hello = "Hello, World!"
>>>> print hello, "->", greet.parseString( hello )
Hello, World! -> ['Hello', ',', 'World', '!']

(Example taken from the PyParsing home page).

With parse actions (functions that are invoked when a certain grammar rule is triggered), you can convert parses directly into abstract syntax trees, or any other representation.

There are many helper functions that encapsulate recurring patterns, like operator hierarchies, quoted strings, nesting or C-style comments.

不知所踪 2024-07-11 04:22:22

我是 PLY 的快乐用户。 它是 Lex & 的纯 Python 实现。 Yacc,有很多小细节,使其非常 Pythonic 且易于使用。 自从莱克斯& Yacc 是最流行的词法分析和词法分析。 PLY 是解析工具,并且被用于大多数项目,它具有站在巨人肩膀上的优势。 Lex & 网上有很多知识。 yacc,你可以自由地将它应用到PLY中。

PLY 还有一个很好的文档页面,其中包含一些简单的示例来帮助您入门。

有关大量 Python 解析工具的列表,请参阅

I'm a happy user of PLY. It is a pure-Python implementation of Lex & Yacc, with lots of small niceties that make it quite Pythonic and easy to use. Since Lex & Yacc are the most popular lexing & parsing tools and are used for the most projects, PLY has the advantage of standing on giants' shoulders. A lot of knowledge exists online on Lex & Yacc, and you can freely apply it to PLY.

PLY also has a good documentation page with some simple examples to get you started.

For a listing of lots of Python parsing tools, see this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文