当前位置：文江博客话题详情

解析树和抽象语法树（AST）有什么区别？

发布于 2024-10-17 17:01:53 字数 41 浏览 13 评论 0原文

它们是由编译过程的不同阶段生成的吗？或者它们只是同一事物的不同名称？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千笙结 2024-10-24 17:01:53

这是基于 Terrence Parr 的表达式求值器语法。

此示例的语法：

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

输入

x=1
y=2
3*(x+y)

解析树

解析树是输入的具体表示。解析树保留输入的所有信息。空框代表空白，即行尾。

Parse Tree

AST

AST 是输入的抽象表示。请注意，AST 中不存在括号，因为关联可以从树结构中导出。

AST

有关详细说明，请参阅编译器和编译器生成器 23
或第 1 页的抽象语法树。 21 编程语言的语法和语义

This is based on the Expression Evaluator grammar by Terrence Parr.

The grammar for this example:

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

Input

x=1
y=2
3*(x+y)

Parse Tree

The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.

Parse Tree

AST

The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.

AST

For a more through explanation see Compilers and Compiler Generators pg. 23
or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages

回复收藏 0 原文

夜清冷一曲。 2024-10-24 17:01:53

以下是在编译器构造的上下文中对解析树（具体语法树，CST）和抽象语法树（AST）的解释。它们是相似的数据结构，但构造方式不同并用于不同的任务。

解析树

解析树通常是在词法分析（将源代码转换为一系列可被视为有意义的单元的标记，而不是字符序列）之后的下一步生成的。

它们是树状数据结构，显示终端输入字符串（源代码标记）是如何通过相关语言的语法生成的。解析树的根是语法最通用的符号——起始符号（例如，语句），内部节点表示起始符号扩展为的非终结符号（可以包括起始符号）符号本身），例如表达式、语句、术语、函数调用。叶子是语法的终端，是语言/输入字符串中作为标识符、关键字和常量出现的实际符号，例如 for、9、 if 等。

在解析时，编译器还会执行各种检查以确保语法的正确性，并且语法错误报告可以嵌入到解析器代码中。

它们可用于通过语法导向的定义或翻译方案进行语法导向的翻译，以完成简单的任务，例如将中缀表达式转换为后缀表达式。

下面是表达式 9 - 5 + 2 的解析树的图形表示（请注意树中终端的位置以及表达式字符串中的实际符号）：

在此处输入图像描述

抽象语法树

AST 表示某些代码的语法结构。编程结构的树，例如表达式、流程控制语句等，分为运算符（内部节点）和操作数（叶）。例如，表达式 i + 9 的语法树将运算符 + 作为根，变量 i 作为运算符的左子节点，数字 9 作为右孩子。

这里的区别在于非终结符和终结符不起作用，因为 AST 不处理语法和字符串生成，而是处理编程构造，因此它们表示这些构造之间的关系，而不是语法生成它们的方式。

请注意，运算符本身是给定语言中的编程结构，并且不必是实际的计算运算符（如 + ）：for 循环也将被视为这边走。例如，您可以有一个语法树，例如 for [ expr, expr, expr, stmnt ] （内联表示），其中 for 是一个运算符，方括号内的元素是它的子元素（代表 C 的 for 语法） - 也由运算符等组成。AST

通常也是由编译器在语法分析（解析）阶段生成的，稍后用于语义分析、中间表示、代码生成等。

这是 AST 的图形表示：

Here's an explanation of parse trees (concrete syntax trees, CSTs) and abstract syntax trees (ASTs), in the context of compiler construction. They're similar data structures, but they're constructed differently and used for different tasks.

Parse trees

Parse trees are usually generated as the next step after lexical analysis (which turns the source code into a series of tokens that can be viewed as meaningful units, as opposed to just a sequence of characters).

They are tree-like data structures that shows how an input string of terminals (source code tokens) has been generated by the grammar of the language in question. The root of the parse tree is the most general symbol of the grammar - the start symbol (for example, statement), and the interior nodes represent nonterminal symbols that the start symbol expands to (can include the start symbol itself), such as expression, statement, term, function call. The leaves are the terminals of the grammar, the actual symbols which appear as identifiers, keywords, and constants in the language / input string, e.g. for, 9, if, etc.

While parsing the compiler also performs various checks to ensure the correctness of syntax - and and syntax error reports can be imbedded into parser code.

They can be used for syntax-directed translation via syntax-directed definitions or translation schemes, for simple tasks such as converting an infix expression to a postfix one.

Here's a graphical representation of a parse tree for the expression 9 - 5 + 2 (note the placement of the terminals in the tree and the actual symbols from the expression string):

enter image description here

Abstract syntax trees

ASTs represent the syntactic structure of the some code. The trees of programming constructs such as expressions, flow control statements, etc - grouped into operators (interior nodes) and operands (leaves). For example, the syntax tree for the expression i + 9 would have the operator + as root, the variable i as the operator's left child, and the number 9 as the right child.

The difference here is that nonterminals and terminals don't play a role, as ASTs don't deal with grammars and string generation, but programming constructs, and thus they represent relationships between such constructs, and not the ways they are generated by a grammar.

Note that the operators themselves are programming constructs in a given language, and don't have to be actual computational operators (like + is): for loops would also be treated in this way. For example, you could have a syntax tree such as for [ expr, expr, expr, stmnt ] (represented inline), where for is an operator, and the elements inside the square brackets are its children (representing C's for syntax) - also composed out of operators etc.

ASTs are usually generated by compilers in the syntax analysis (parsing) phase as well, and are used later for semantic analysis, intermediate representation, code generation, etc.

Here's a graphical representation of an AST:

enter image description here