解析树和抽象语法树(AST)有什么区别?

发布于 2024-10-17 17:01:53 字数 41 浏览 9 评论 0原文

它们是由编译过程的不同阶段生成的吗?或者它们只是同一事物的不同名称?

Are they generated by different phases of a compiling process? Or are they just different names for the same thing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

千笙结 2024-10-24 17:01:53

这是基于 Terrence Parr 的表达式求值器语法。

此示例的语法:

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

输入

x=1
y=2
3*(x+y)

解析树

解析树是输入的具体表示。解析树保留输入的所有信息。空框代表空白,即行尾。

Parse Tree

AST

AST 是输入的抽象表示。请注意,AST 中不存在括号,因为关联可以从树结构中导出。

AST

有关详细说明,请参阅 编译器和编译器生成器 23
或第 1 页的抽象语法树。 21 编程语言的语法和语义

This is based on the Expression Evaluator grammar by Terrence Parr.

The grammar for this example:

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

Input

x=1
y=2
3*(x+y)

Parse Tree

The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.

Parse Tree

AST

The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.

AST

For a more through explanation see Compilers and Compiler Generators pg. 23
or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages

夜清冷一曲。 2024-10-24 17:01:53

以下是在编译器构造的上下文中对解析树(具体语法树,CST)和抽象语法树(AST)的解释。它们是相似的数据结构,但构造方式不同并用于不同的任务。

解析树

解析树通常是在词法分析(将源代码转换为一系列可被视为有意义的单元的标记,而不是字符序列)之后的下一步生成的。

它们是树状数据结构,显示终端输入字符串(源代码标记)是如何通过相关语言的语法生成的。解析树的根是语法最通用的符号——起始符号(例如,语句),内部节点表示起始符号扩展为的非终结符号(可以包括起始符号)符号本身),例如表达式语句术语函数调用。叶子是语法的终端,是语言/输入字符串中作为标识符、关键字和常量出现的实际符号,例如 for9 if 等。

在解析时,编译器还会执行各种检查以确保语法的正确性,并且语法错误报告可以嵌入到解析器代码中。

它们可用于通过语法导向的定义或翻译方案进行语法导向的翻译,以完成简单的任务,例如将中缀表达式转换为后缀表达式。

下面是表达式 9 - 5 + 2 的解析树的图形表示(请注意树中终端的位置以及表达式字符串中的实际符号):

在此处输入图像描述

抽象语法树

AST 表示某些代码的语法结构。编程结构的树,例如表达式、流程控制语句等,分为运算符(内部节点)和操作数(叶)。例如,表达式 i + 9 的语法树将运算符 + 作为根,变量 i 作为运算符的左子节点,数字 9 作为右孩子。

这里的区别在于非终结符和终结符不起作用,因为 AST 不处理语法和字符串生成,而是处理编程构造,因此它们表示这些构造之间的关系,而不是语法生成它们的方式。

请注意,运算符本身是给定语言中的编程结构,并且不必是实际的计算运算符(如 + ):for 循环也将被视为这边走。例如,您可以有一个语法树,例如 for [ expr, expr, expr, stmnt ] (内联表示),其中 for 是一个运算符,方括号内的元素是它的子元素(代表 C 的 for 语法) - 也由运算符等组成。AST

通常也是由编译器在语法分析(解析)阶段生成的,稍后用于语义分析、中间表示、代码生成等。

这是 AST 的图形表示:

Here's an explanation of parse trees (concrete syntax trees, CSTs) and abstract syntax trees (ASTs), in the context of compiler construction. They're similar data structures, but they're constructed differently and used for different tasks.

Parse trees

Parse trees are usually generated as the next step after lexical analysis (which turns the source code into a series of tokens that can be viewed as meaningful units, as opposed to just a sequence of characters).

They are tree-like data structures that shows how an input string of terminals (source code tokens) has been generated by the grammar of the language in question. The root of the parse tree is the most general symbol of the grammar - the start symbol (for example, statement), and the interior nodes represent nonterminal symbols that the start symbol expands to (can include the start symbol itself), such as expression, statement, term, function call. The leaves are the terminals of the grammar, the actual symbols which appear as identifiers, keywords, and constants in the language / input string, e.g. for, 9, if, etc.

While parsing the compiler also performs various checks to ensure the correctness of syntax - and and syntax error reports can be imbedded into parser code.

They can be used for syntax-directed translation via syntax-directed definitions or translation schemes, for simple tasks such as converting an infix expression to a postfix one.

Here's a graphical representation of a parse tree for the expression 9 - 5 + 2 (note the placement of the terminals in the tree and the actual symbols from the expression string):

enter image description here

Abstract syntax trees

ASTs represent the syntactic structure of the some code. The trees of programming constructs such as expressions, flow control statements, etc - grouped into operators (interior nodes) and operands (leaves). For example, the syntax tree for the expression i + 9 would have the operator + as root, the variable i as the operator's left child, and the number 9 as the right child.

The difference here is that nonterminals and terminals don't play a role, as ASTs don't deal with grammars and string generation, but programming constructs, and thus they represent relationships between such constructs, and not the ways they are generated by a grammar.

Note that the operators themselves are programming constructs in a given language, and don't have to be actual computational operators (like + is): for loops would also be treated in this way. For example, you could have a syntax tree such as for [ expr, expr, expr, stmnt ] (represented inline), where for is an operator, and the elements inside the square brackets are its children (representing C's for syntax) - also composed out of operators etc.

ASTs are usually generated by compilers in the syntax analysis (parsing) phase as well, and are used later for semantic analysis, intermediate representation, code generation, etc.

Here's a graphical representation of an AST:

enter image description here

少女净妖师 2024-10-24 17:01:53

据我了解,AST更侧重于源代码组件之间的抽象关系,而解析树则侧重于语言所使用的语法的实际实现,包括挑剔的细节。它们绝对不一样,因为“解析树”的另一个术语是“具体语法树”。

From what I understand, the AST focuses more on the abstract relationships between the components of source code, while the parse tree focuses on the actual implementation of the grammar utilized by the language, including the nitpicky details. They are definitely not the same, since another term for "parse tree" is "concrete syntax tree".

z祗昰~ 2024-10-24 17:01:53

Martin Fowler 的DSL 书很好地解释了这一点。 AST 仅包含将用于进一步处理的所有“有用”元素,而解析树包含您解析的原始文档中的所有工件(空格、括号……)

The DSL book from Martin Fowler explains this nicely. The AST only contains all 'useful' elements that will be used for further processing, while the parse tree contains all the artifacts (spaces, brackets, ...) from the original document you parse

勿忘初心 2024-10-24 17:01:53

AST 从概念上描述源代码,它不需要包含解析某些源代码所需的所有语法元素(花括号、关键字、括号等)。

解析树更接近地表示源代码。

在 AST 中,IF 语句的节点只能包含三个子节点:

  • Condition
  • If Case
  • Else Case

对于类 C 语言,解析树需要包含“if”关键字、括号和大括号的节点。

An AST describes the source code conceptually, it doesn't need to contain all the syntactical elements required to parse some source code (curly braces, keywords, parenthesis etc.).

A Parse tree represents the source code more closely.

In an AST the node for an IF statement could contain just three children:

  • Condition
  • If Case
  • Else Case

For a C-like language the Parse Tree would need to contain nodes for the 'if' keyword, parenthesis, curly braces also.

一绘本一梦想 2024-10-24 17:01:53

维基百科说

解析树具体反映了输入语言的语法,使其与计算机编程中使用的抽象语法树不同。

Quora 上的一个回答说

解析树是用于匹配某些输入文本的规则(和标记)的记录,而语法树记录输入的结构,并且对生成它的语法不敏感。

结合以上两个定义,

抽象语法树在逻辑上描述了解析树。它不需要包含解析某些源代码所需的所有语法结构(空格、大括号、关键字、括号等)。这就是为什么解析树也被称为具体语法树,而AST被称为语法树。因此,语法分析器的输出实际上是语法树。

Wikipedia says

Parse trees concretely reflect the syntax of the input language, making them distinct from the abstract syntax trees used in computer programming.

An answer on Quora says

A parse tree is a record of the rules (and tokens) used to match some input text whereas a syntax tree records the structure of the input and is insensitive to the grammar that produced it.

Combining the above two definitions,

An Abstract Syntax Tree describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's why Parse Tree is also called Concrete Syntax Tree while the AST is called Syntax Tree. The output of syntax analyser is, thus, syntax tree actually.

泪意 2024-10-24 17:01:53

接受帕斯卡作业
年龄:= 42;

语法树看起来就像源代码一样。下面我在节点周围加上括号。
[Age][:=][42][;]

抽象树看起来像这样
[=][Age][42]

赋值变成一个有 2 个元素的节点,Age 和 42。这个想法是您可以执行赋值。

另请注意,pascal 语法消失了。因此,可以有不止一种语言生成相同的 AST。这对于跨语言脚本引擎很有用。

Take the pascal assignment
Age:= 42;

The syntax tree would look just like the source code. Below I am putting brackets around the nodes.
[Age][:=][42][;]

An abstract tree would look like this
[=][Age][42]

The assignment becomes a node with 2 elements, Age and 42. The idea is that you can execute the assignment.

Also note that the pascal syntax disappears. Thus it is possible to have more than one language generate the same AST. This is useful for cross language script engines.

热风软妹 2024-10-24 17:01:53

在解析树中,内部节点是非终端,叶子是终端。
在语法树中,内部节点是运算符,叶子是操作数。

In parse tree interior nodes are non terminal, leaves are terminal.
In syntax tree interior nodes are operator, leaves are operands.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文