术语解析树和推导树之间有什么区别?
当提及符合语法的文本解析结果时,术语 AST(抽象语法树)、解析树和推导树被不同的人广泛使用。假设我们正在讨论解析计算机语言,它们的差异是否足够小以至于我们可以互换使用这些术语?如果没有,我们如何正确使用这些术语?
The terms AST (Abstract Syntax Tree), parse tree and derivation tree are bandied about by different people when referring to the result of parsing texts conforming to a grammar. Assuming we are talking about parsing computer languages, are their differences minute enough that we can use these terms interchangeably ? If not, how do we use the terms correctly ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
AFAIK,“推导树”和“解析树”是相同的。
抽象语法树
解析树
以源代码
a = (1 + 2) * 3;
为例。 解析树可能如下所示:而抽象语法树可能如下所示:
AFAIK, "derivation tree" and "parse tree" are the same.
Abstract syntax tree
Parse tree
Take the source
a = (1 + 2) * 3;
for example. The parse tree could look like:while the abstract syntax tree could look like:
解析/推导/具体语法树都是同一概念的同义词。
此类树通常仅用于理论讨论,因为它们包含许多对于进行语言处理似乎不必要的细节;在表达式树中,您真的需要一个节点来表示“(”,另一个节点来表示“)”吗?
“抽象语法”树的概念是将程序结构表示到足以在实践中处理的详细程度的概念;您通常不会找到“(...)”的节点。
一个有趣的问题:AST 可以直接从 CST 计算出来吗?答案应该是肯定的,但人们很少这样做。他们倾向于做的是在解析器运行时构建“抽象语法”节点,并使用临时(规则简化过程附件)将子解析中的节点与父级的粘合节点组装起来。恕我直言,他们这样做是因为我们都是在 YACC 上长大的,这就是传统的做法。 (我们过去也用燧石生火。)还有一个更小的借口:这样做可以让编译器构建者完全控制 AST 的结构,并且他可以生成一个在额外细节方面非常少的结构。
这样的 ad-hoc 树无法从 CST 计算出来,除非通过嵌入在解析器操作中的相同的 ad-hoc 计算。
我使用了不同的方法:我的工具从字面上直接从 CST 计算 AST通过删除不相关的细节,例如,省略代表非值承载标记的节点(例如,那些无意义的'('')'标记以及关键字),压缩一元产生式的字符串,以及转换右倾或左倾树相当于列表,变成真正的列表节点。这样做的优点是解析器可以直接根据语法规则计算 AST。无需摆弄程序附件。不会弄错的。不再担心 我们的 COBOL 语法 有 3500 条规则,否则我会他们中的每一个都需要程序性的粘性,而我必须改变我的语法数百次才能使其正确,并且每次都会摆弄粘性。我们的工具的工作方式就像直接在 CST 上操作一样,这使得思考树操作变得很容易,特别是当您直接关注语法规则时。
(这也使得使用表面语法的模式匹配变得更加容易:对于任何模式片段,都有一个对应的直接可计算的 AST)。
因此,AST 和 CST 之间的区别在实用性方面是真实存在的。但我认为它们应该被视为只是同构表示。
Parse/Derivation/Concrete syntax trees are all synonyms for the same concept.
Such trees are normally only used in theory discussions, because they contains lots of details that seem unnecessary for doing langauge processing; in an expression tree, do you really need a node to represent "(" and another to represent ")"?
The notion of an "abstract syntax" tree is one which represents the program structure to a level of detail that is adequate for processing in practice; you don't typically find nodes for "(...)".
An interesting question: is an AST directly computable from a CST? The answer is should be yes, but people hardly ever do it. What they tend to do is construct "abstract syntax" nodes as the parser runs, and use ad hoc (rule reduction procedural attachment) to assemble the nodes from child parses with a glue node for the parent. IMHO, they do this because we've all been brought up on YACC, and that's how it is traditionally done. (We used to light fires with flint, too.) There's a lesser excuse; doing it this way gives the compiler-builder complete control of the structure of the AST and he can produce one that is pretty minimal in terms of extra detail.
Such an ad-hoc tree is not computable from the CST, except by the same ad-hoc computation that is embedded in the parser actions.
I've used a different approach: my tools compute AST directly from CSTs, literally by dropping irrelevant details, e.g., leaving out nodes that represent non-value bearing tokens (e.g., those pointless '(' ')' tokens as well as keywords), compressing out strings of unary productions, and converting right- or left-leaning trees equivalent to lists, into real list nodes. The advantage to doing this is the parser can compute the AST directly from the grammar rules. No fiddling around with procedural attachments. No getting it wrong. No more worrying about the fact that our COBOL grammar has 3500 rules and I would otherwise need procedural goo for every one of them, and that I have to change my grammar hundreds of times to get it right and fiddle with the goo every time. And our tools work as though they operate directly on the CST, which makes it easy to think about tree manipulations, especially if you are staring directly at the grammar rules.
(This also makes pattern matching using surface syntax much easier: for any pattern fragment, there's a directly computable AST that corresponds).
So the distinction between AST and CST is real in terms of utility. But I think they should be considered as just isomorphic representations.
当通过解析生成树时,即评估给定的输入序列是否属于该语言并确定必须以何种顺序使用哪些产生式来重新生成序列时,我将使用术语“解析树”。
派生树将具有完全相同的形状,但将通过从给定产生式派生序列的过程来生成。
解析的正式定义是找到给定输入序列的推导,因此难怪推导和解析树是相同的。
具体与抽象语法树的不同之处在于,前者对于输入序列中的每个标记都有一个叶节点,而后者则省略了可以通过检查语法得知的任何标记。
if的具体语法子树然后<语句> else <语句> end
将具有 if、then、else 和 end 的叶子,以及抽象叶子不会。(2+3)
的具体语法树为:抽象语法树为:
I would use the term parse tree when the tree is produced by parsing, that is when evaluating if a given input sequence belongs to the language and determining which productions must be used in which order to regenerate the sequence.
A derivation tree would have exactly the same shape, but would be produced by the process of deriving a sequence from a given production.
The formal definition of parsing is finding a derivation for the given input sequence, so no wonder derivation and parse trees are the same.
Concrete versus abstract syntax trees differ in that the former has a leaf node for each token in the input sequence, while the latter omits any tokens that can be known by inspecting the grammar. A concrete syntax subtree for
if <expr> then <statement> else <statement> end
would have leafs for if, then, else, and end, and the abstract one would not. The concrete syntax tree for(2+3)
would be:The abstract one would be just: