解析树和抽象语法树(AST)有什么区别?
它们是由编译过程的不同阶段生成的吗?或者它们只是同一事物的不同名称?
Are they generated by different phases of a compiling process? Or are they just different names for the same thing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
这是基于 Terrence Parr 的表达式求值器语法。
此示例的语法:
输入
解析树
解析树是输入的具体表示。解析树保留输入的所有信息。空框代表空白,即行尾。
AST
AST 是输入的抽象表示。请注意,AST 中不存在括号,因为关联可以从树结构中导出。
有关详细说明,请参阅 编译器和编译器生成器 23
或第 1 页的抽象语法树。 21 编程语言的语法和语义
This is based on the Expression Evaluator grammar by Terrence Parr.
The grammar for this example:
Input
Parse Tree
The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.
AST
The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.
For a more through explanation see Compilers and Compiler Generators pg. 23
or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages
以下是在编译器构造的上下文中对解析树(具体语法树,CST)和抽象语法树(AST)的解释。它们是相似的数据结构,但构造方式不同并用于不同的任务。
解析树
解析树通常是在词法分析(将源代码转换为一系列可被视为有意义的单元的标记,而不是字符序列)之后的下一步生成的。
它们是树状数据结构,显示终端输入字符串(源代码标记)是如何通过相关语言的语法生成的。解析树的根是语法最通用的符号——起始符号(例如,语句),内部节点表示起始符号扩展为的非终结符号(可以包括起始符号)符号本身),例如表达式、语句、术语、函数调用。叶子是语法的终端,是语言/输入字符串中作为标识符、关键字和常量出现的实际符号,例如 for、9、 if 等。
在解析时,编译器还会执行各种检查以确保语法的正确性,并且语法错误报告可以嵌入到解析器代码中。
它们可用于通过语法导向的定义或翻译方案进行语法导向的翻译,以完成简单的任务,例如将中缀表达式转换为后缀表达式。
下面是表达式
9 - 5 + 2
的解析树的图形表示(请注意树中终端的位置以及表达式字符串中的实际符号):抽象语法树
AST 表示某些代码的语法结构。编程结构的树,例如表达式、流程控制语句等,分为运算符(内部节点)和操作数(叶)。例如,表达式
i + 9
的语法树将运算符+
作为根,变量i
作为运算符的左子节点,数字9
作为右孩子。这里的区别在于非终结符和终结符不起作用,因为 AST 不处理语法和字符串生成,而是处理编程构造,因此它们表示这些构造之间的关系,而不是语法生成它们的方式。
请注意,运算符本身是给定语言中的编程结构,并且不必是实际的计算运算符(如
+
):for
循环也将被视为这边走。例如,您可以有一个语法树,例如for [ expr, expr, expr, stmnt ]
(内联表示),其中for
是一个运算符,方括号内的元素是它的子元素(代表 C 的for
语法) - 也由运算符等组成。AST通常也是由编译器在语法分析(解析)阶段生成的,稍后用于语义分析、中间表示、代码生成等。
这是 AST 的图形表示:
Here's an explanation of parse trees (concrete syntax trees, CSTs) and abstract syntax trees (ASTs), in the context of compiler construction. They're similar data structures, but they're constructed differently and used for different tasks.
Parse trees
Parse trees are usually generated as the next step after lexical analysis (which turns the source code into a series of tokens that can be viewed as meaningful units, as opposed to just a sequence of characters).
They are tree-like data structures that shows how an input string of terminals (source code tokens) has been generated by the grammar of the language in question. The root of the parse tree is the most general symbol of the grammar - the start symbol (for example, statement), and the interior nodes represent nonterminal symbols that the start symbol expands to (can include the start symbol itself), such as expression, statement, term, function call. The leaves are the terminals of the grammar, the actual symbols which appear as identifiers, keywords, and constants in the language / input string, e.g. for, 9, if, etc.
While parsing the compiler also performs various checks to ensure the correctness of syntax - and and syntax error reports can be imbedded into parser code.
They can be used for syntax-directed translation via syntax-directed definitions or translation schemes, for simple tasks such as converting an infix expression to a postfix one.
Here's a graphical representation of a parse tree for the expression
9 - 5 + 2
(note the placement of the terminals in the tree and the actual symbols from the expression string):Abstract syntax trees
ASTs represent the syntactic structure of the some code. The trees of programming constructs such as expressions, flow control statements, etc - grouped into operators (interior nodes) and operands (leaves). For example, the syntax tree for the expression
i + 9
would have the operator+
as root, the variablei
as the operator's left child, and the number9
as the right child.The difference here is that nonterminals and terminals don't play a role, as ASTs don't deal with grammars and string generation, but programming constructs, and thus they represent relationships between such constructs, and not the ways they are generated by a grammar.
Note that the operators themselves are programming constructs in a given language, and don't have to be actual computational operators (like
+
is):for
loops would also be treated in this way. For example, you could have a syntax tree such asfor [ expr, expr, expr, stmnt ]
(represented inline), wherefor
is an operator, and the elements inside the square brackets are its children (representing C'sfor
syntax) - also composed out of operators etc.ASTs are usually generated by compilers in the syntax analysis (parsing) phase as well, and are used later for semantic analysis, intermediate representation, code generation, etc.
Here's a graphical representation of an AST:
据我了解,AST更侧重于源代码组件之间的抽象关系,而解析树则侧重于语言所使用的语法的实际实现,包括挑剔的细节。它们绝对不一样,因为“解析树”的另一个术语是“具体语法树”。
From what I understand, the AST focuses more on the abstract relationships between the components of source code, while the parse tree focuses on the actual implementation of the grammar utilized by the language, including the nitpicky details. They are definitely not the same, since another term for "parse tree" is "concrete syntax tree".
Martin Fowler 的DSL 书很好地解释了这一点。 AST 仅包含将用于进一步处理的所有“有用”元素,而解析树包含您解析的原始文档中的所有工件(空格、括号……)
The DSL book from Martin Fowler explains this nicely. The AST only contains all 'useful' elements that will be used for further processing, while the parse tree contains all the artifacts (spaces, brackets, ...) from the original document you parse
AST 从概念上描述源代码,它不需要包含解析某些源代码所需的所有语法元素(花括号、关键字、括号等)。
解析树更接近地表示源代码。
在 AST 中,IF 语句的节点只能包含三个子节点:
对于类 C 语言,解析树需要包含“if”关键字、括号和大括号的节点。
An AST describes the source code conceptually, it doesn't need to contain all the syntactical elements required to parse some source code (curly braces, keywords, parenthesis etc.).
A Parse tree represents the source code more closely.
In an AST the node for an IF statement could contain just three children:
For a C-like language the Parse Tree would need to contain nodes for the 'if' keyword, parenthesis, curly braces also.
维基百科说
Quora 上的一个回答说
结合以上两个定义,
抽象语法树在逻辑上描述了解析树。它不需要包含解析某些源代码所需的所有语法结构(空格、大括号、关键字、括号等)。这就是为什么
解析树
也被称为具体语法树
,而AST被称为语法树
。因此,语法分析器的输出实际上是语法树。Wikipedia says
An answer on Quora says
Combining the above two definitions,
An
Abstract Syntax Tree
describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's whyParse Tree
is also calledConcrete Syntax Tree
while the AST is calledSyntax Tree
. The output of syntax analyser is, thus, syntax tree actually.接受帕斯卡作业
年龄:= 42;
语法树看起来就像源代码一样。下面我在节点周围加上括号。
[Age][:=][42][;]
抽象树看起来像这样
[=][Age][42]
赋值变成一个有 2 个元素的节点,Age 和 42。这个想法是您可以执行赋值。
另请注意,pascal 语法消失了。因此,可以有不止一种语言生成相同的 AST。这对于跨语言脚本引擎很有用。
Take the pascal assignment
Age:= 42;
The syntax tree would look just like the source code. Below I am putting brackets around the nodes.
[Age][:=][42][;]
An abstract tree would look like this
[=][Age][42]
The assignment becomes a node with 2 elements, Age and 42. The idea is that you can execute the assignment.
Also note that the pascal syntax disappears. Thus it is possible to have more than one language generate the same AST. This is useful for cross language script engines.
在解析树中,内部节点是非终端,叶子是终端。
在语法树中,内部节点是运算符,叶子是操作数。
In parse tree interior nodes are non terminal, leaves are terminal.
In syntax tree interior nodes are operator, leaves are operands.