抽象语法树的构建和遍历
我不清楚抽象语法树的结构。要在 AST 代表的程序源代码中“向下(向前)”,是直接进入最顶层的节点,还是向下?例如,示例程序会
a = 1
b = 2
c = 3
d = 4
e = 5
产生如下所示的 AST:
或这样:
在第一个节点中,在主节点
上“向右”将推进您完成该程序,但在第二个程序中,只需跟随每个节点上的 next
指针即可执行相同的操作。
似乎第二个会更正确,因为您不需要像特殊节点类型这样的东西,并且第一个节点可能具有非常长的指针数组。尽管如此,当您进入 for 循环和 if 分支以及更复杂的事情时,我可以看到第二个变得比第一个更复杂。
I am unclear on the structure of abstract syntax trees. To go "down (forward)" in the source of the program that the AST represents, do you go right on the very top node, or do you go down? For instance, would the example program
a = 1
b = 2
c = 3
d = 4
e = 5
Result in an AST that looks like this:
or this:
Where in the first one, going "right" on the main node
will advance you through the program, but in the second one simply following the next
pointer on each node will do the same.
It seems like the second one would be more correct since you don't need something like a special node type with a potentially extremely long array of pointers for the very first node. Although, I can see the second one becoming more complicated than the first when you get into for
loops and if
branches and more complicated things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
第一种表示是更典型的表示,尽管第二种表示与作为递归数据结构的树的构造兼容,这可以在实现平台是函数式而非命令式时使用。
考虑:
这是您的第一个示例,除了缩短并带有“主”节点(概念性的稻草人) )更恰当地命名为“块”,以反映包含命令式编程语言中的语句序列的“块”的常见构造。不同类型的节点有不同类型的子节点,有时这些子节点包括顺序很重要的辅助节点的集合,就像“块”的情况一样。例如,数组初始化也可能会产生同样的情况:
考虑如何在语法树中表示:
这里,数组文字类型节点还具有多个相同类型的子节点,其顺序很重要。
The first representation is the more typical one, though the second is compatible with the construction of a tree as a recursive data structure, as may be used when the implementation platform is functional rather than imperative.
Consider:
This is your first example, except shortened and with the "main" node (a conceptual straw man) more appropriately named "block," to reflect the common construct of a "block" containing a sequence of statements in an imperative programming language. Different kinds of nodes have different kinds of children, and sometimes those children include collections of subsidiary nodes whose order is important, as is the case with "block." The same might arise from, say, an array initialization:
Consider how this might be represented in a syntax tree:
Here, the array-literal-type node also has multiple children of the same type whose order is important.
方法,并且我认为当您不需要维护指向下一个节点的指针时,您会发现构建 AST 更加容易。
我认为让所有对象都源自一个公共基类通常更容易,类似于:
生成的 AST 如下:
I'd nearly always prefer the first approach, and I think you'll find it much easier to construct your AST when you don't need to maintain a pointer to the next node.
I think its generally easier to have all objects descend from a common base class, similar to this:
Resulting AST is as follows:
这取决于语言。在 C 中,您必须使用第一种形式来捕获块的概念,因为块具有变量范围:
变量范围将是您所谓的“主节点”的属性。
It depends on the language. In C, you'd have to use the first form to capture the notion of a block, since a block has a variable scope:
The variable scope would be an attribute of what you call the "main node".
我相信你的第一个版本更有意义,原因有几个。
首先,第一个更清楚地展示了程序的“嵌套性”,并且也清楚地实现为有根树(这是树的通常概念)。
第二个也是更重要的原因是,您的“主节点”实际上可能是一个“分支节点”(例如),它可以只是更大的 AST 中的另一个节点。这样,您的 AST 就可以以递归的方式查看,其中每个 AST 都是一个节点,其他 AST 作为它的子节点。这使得第一个设计更加简单、更加通用并且非常同质。
I believe your first version make more sense, for a couple of reasons.
Firstly, the first more clearly demonstrates the "nestedness" of the program, and also is clearly implemented as a rooted tree (which is the usual concept of a tree).
The second, and more important reason, is that your "main node" could really have been a "branch node" (for example), which can simply be another node within a larger AST. This way, your AST can be viewed in a recursive sense, where each AST is a node with other ASTs as it children. This make the design of the first much simpler, more general, and very homogeneous.
建议:在处理树数据结构时,无论是与编译器相关的 AST 还是其他类型,始终使用单个“根”节点,它可以帮助您执行操作并拥有更多控制权:
干杯。
Suggestion: When dealing with tree data structures, wheter is compiler-related AST or other kind, always use a single "root" node, it may help you perform operations and have more control:
Cheers.