抽象语法树有什么用?
我正在自学如何为编程语言编写解释器,并且我已经阅读了有关抽象语法树的内容。我知道它们是什么,但我不知道它们的用途。
为什么 AST 有用?
I am learning on my own about writing an interpreter for a programming language, and I have read about Abstract Syntax Trees. I have an idea of what they are, but I do not see their use.
Why are ASTs useful?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
它们代表代码的逻辑/语法,这自然是一棵树而不是行列表,而不会陷入具体的语法问题,例如您在哪里 放置星号。
然后,可以从后端的 POV 以更加一致和方便的方式操纵逻辑,这可以(并且对于除 Lisp 之外的所有内容)与我们编写具体语法的方式非常不同。
They represent the logic/syntax of the code, which is naturally a tree rather than a list of lines, without getting bogged down in concrete syntax issues such as where you place your asterisk.
The logic can then be manipulated in a manner more consistent and convenient from the backend's POV, which can be (and is, for everything but Lisps ;) very different from how we write the concrete syntax.
使用 AST 的主要好处是可以将解析和验证逻辑与实现部分分开。作为 AST 实现的解释器确实更容易理解和维护。如果你在解析一些奇怪的语法时遇到问题,你可以查看 AST 解析器,如果一段代码没有产生预期的结果,那么你可以查看解释 AST 的代码。
另一个巨大的优点是,当你的语法需要“先行”时,例如,如果你的语法允许在定义子例程之前使用它,那么在使用 AST 时验证子例程的存在是微不足道的 - 使用“时要困难得多”即时”解析器。
The main benefit os using an AST is that you separate the parsing and validation logic from the implementation piece. Interpreters implemented as ASTs really are easier to understand and maintain. If you have a problem parsing some strange syntax you look at the AST parser , if a pices of code is not producing the expected results than you look at the code that interprets the AST.
The other great advantage is when you syntax requires "lookahead" e.g. if your syntax allows a subroutine to be used before it is defined it is trivial to validate the existence of a subroutine when you are using an AST - its much more difficult with an "on the fly" parser.
您需要“语法树”来表示大多数编程语言的结构,以便对包含编程语言文本的文档进行分析或转换。 (你可以通过我的简历看到一些奇特的例子)。
无论该树是抽象的 (AST) 还是具体的 (CST),都取决于品味、便利性和工程汗水。术语CST专门用于描述当语法用于解构源代码时的解析推导树;它通常包含许多具体语法的树元素,例如语句终止符分号。 AST 用于表示“比 CST 更简单的东西”,例如,省略分号树节点,因为它们不会对程序分析产生太大影响,因此编写处理 AST 的分析器比在国家标准时间。理解这一点的更好方法是认识到 AST 通常与 CST 是同构等价的,也就是说,您应该能够从中重新生成 CST。如果您想转换源文本并重新生成它,那么 CST 通常是更好的选择,因为它从原始程序中丢失的信息较少(我的奇特示例使用了这种方法)。
我想你会发现关于 抽象与具体语法树非常有帮助。
You need "syntax trees" to represent the structure of most programming langauges, in order to carry out analysis or transformation on documents that contain programming language text. (You can see some fancy examples of this via my bio).
Whether that tree is abstract (AST) or concrete (CST) is a matter of taste, convenience, and engineering sweat. The term CST is specially used to describe the parse derivation tree when a grammar is used to deconstruct source code; it usually contains tree elements for lots of concrete syntax such as statement terminator semicolons. AST is used to mean "something simpler than the CST", e.g., leaving out semicolon tree nodes because they don't affect program analysis much, and thus writing analyzers that process ASTs is less conceptual and engineering effort than writing the same analyzer on a CST. A better way to understand this is to realize that the AST is usually as isomorphic equivalent of the CST, that is, you should be able to regenerate the CST from it. If you want to transform the source text and regenerate it, then the CST is often a better choice as it loses less information from the original program (and my fancy example uses this approach).
I think you will find the SO discussion on abstract vs. concrete syntax trees pretty helpful.
一般来说,您会将代码解析为某种形式的 AST,它可能或多或少是一个正式的模型。因此,我认为柯克·沃尔(Kirk Woll)通过他上面的评论得到的意思是,当你解析语言时,你经常使用解析器来创建你正在阅读的原始内容的某种数据模型,通常以树的方式组织。因此,根据这个定义,AST 是很难避免的,除非你正在做一个非常简单的翻译器。
我经常使用 ANTLR 来解析复杂的语言,在这种情况下,AST 有一个稍微更具体的含义。 ANTLR 有一种使用非常简单的操作在解析器语法中生成 AST 的便捷方法。然后,您可以为该 AST 编写一个通常更简单的解析器,您可以像正在处理的语言的更简单版本一样对其进行操作。构建两个解析器的额外工作是否是净收益取决于语言复杂性以及解析后您计划用它做什么。
您可能想看一下有关该主题的一本好书,即 ANTLR 作者 Terrence Parr 所著的《语言实现模式》。他非常彻底地讨论了这个话题。也就是说,在开始使用 AST 之前我并没有真正了解它们,因此(像往常一样)这是理解它们的最佳方式。
In general you are going to parse you code into some form of AST, it may be more or less of a formal model. So I think what Kirk Woll was getting at by his comment above is that when you parse the language, you very often use the parser to create some sort of data model of the raw content of what you are reading, generally organized in a tree fashion. So by that definition an AST is hard to avoid unless you are doing a very simple translator.
I use ANTLR often for parsing complex languages and in that context there is a slightly more specific meaning of an AST. ANTLR has a handy way of generating an AST in the parser grammar using pretty simple actions. You then write a generally much simpler parser for this AST which you can operate on like a much simpler version the language you are processing. Whether the extra work of building two parsers is a net gain is a function of the language complexity and what you are planning on doing with with it once you parsed it.
A good book on the subject that you may want to take a look at is "Language Implementation Patterns" by Terrence Parr the ANTLR author. He addresses this topic pretty thoroughly. That said, I didn't really get ASTs until I started using them, so that (as usual) is the best way to understand them.
问题迟到了,但我想我应该补充一些东西。
Late to the question but I thought I'd add something.