抽象语法树的构建和遍历

发布于 2024-11-10 07:17:27 字数 488 浏览 0 评论 0原文

我不清楚抽象语法树的结构。要在 AST 代表的程序源代码中“向下(向前)”,是直接进入最顶层的节点,还是向下?例如,示例程序会

a = 1
b = 2
c = 3
d = 4
e = 5

产生如下所示的 AST: 在此处输入图像描述

或这样: 在此处输入图像描述

在第一个节点中,在主节点上“向右”将推进您完成该程序,但在第二个程序中,只需跟随每个节点上的 next 指针即可执行相同的操作。

似乎第二个会更正确,因为您不需要像特殊节点类型这样的东西,并且第一个节点可能具有非常长的指针数组。尽管如此,当您进入 for 循环和 if 分支以及更复杂的事情时,我可以看到第二个变得比第一个更复杂。

I am unclear on the structure of abstract syntax trees. To go "down (forward)" in the source of the program that the AST represents, do you go right on the very top node, or do you go down? For instance, would the example program

a = 1
b = 2
c = 3
d = 4
e = 5

Result in an AST that looks like this:
enter image description here

or this:
enter image description here

Where in the first one, going "right" on the main node will advance you through the program, but in the second one simply following the next pointer on each node will do the same.

It seems like the second one would be more correct since you don't need something like a special node type with a potentially extremely long array of pointers for the very first node. Although, I can see the second one becoming more complicated than the first when you get into for loops and if branches and more complicated things.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

时光瘦了 2024-11-17 07:17:27

第一种表示是更典型的表示,尽管第二种表示与作为递归数据结构的树的构造兼容,这可以在实现平台是函数式而非命令式时使用。

考虑:

在此处输入图像描述

这是您的第一个示例,除了缩短并带有“主”节点(概念性的稻草人) )更恰当地命名为“块”,以反映包含命令式编程语言中的语句序列的“块”的常见构造。不同类型的节点有不同类型的子节点,有时这些子节点包括顺序很重要的辅助节点的集合,就像“块”的情况一样。例如,数组初始化也可能会产生同样的情况:

int[] arr = {1, 2}

考虑如何在语法树中表示:

在此处输入图像描述

这里,数组文字类型节点还具有多个相同类型的子节点,其顺序很重要。

The first representation is the more typical one, though the second is compatible with the construction of a tree as a recursive data structure, as may be used when the implementation platform is functional rather than imperative.

Consider:

enter image description here

This is your first example, except shortened and with the "main" node (a conceptual straw man) more appropriately named "block," to reflect the common construct of a "block" containing a sequence of statements in an imperative programming language. Different kinds of nodes have different kinds of children, and sometimes those children include collections of subsidiary nodes whose order is important, as is the case with "block." The same might arise from, say, an array initialization:

int[] arr = {1, 2}

Consider how this might be represented in a syntax tree:

enter image description here

Here, the array-literal-type node also has multiple children of the same type whose order is important.

猛虎独行 2024-11-17 07:17:27

在第一个中,“向右”走
在主节点上会让你前进
通过该程序,但在第二个
简单地跟随下一个指针
每个节点上都会执行相同的操作。

第二个好像是
更正确,因为你不需要
类似于特殊的节点类型
具有可能非常长的
第一个的指针数组
节点

方法,并且我认为当您不需要维护指向下一个节点的指针时,您会发现构建 AST 更加容易。

我认为让所有对象都源自一个公共基类通常更容易,类似于:

abstract class Expr { }

class Block : Expr
{
    Expr[] Statements { get; set; }
    public Block(Expr[] statements) { ... }
}

class Assign : Expr
{
    Var Variable { get; set; }
    Expr Expression { get; set; }
    public Assign(Var variable, Expr expression) { ... }
}

class Var : Expr
{
    string Name { get; set; }
    public Variable(string name) { ... }
}

class Int : Expr
{
    int Value { get; set; }
    public Int(int value) { ... }
}

生成的 AST 如下:

Expr program =
    new Block(new Expr[]
        {
            new Assign(new Var("a"), new Int(1)),
            new Assign(new Var("b"), new Int(2)),
            new Assign(new Var("c"), new Int(3)),
            new Assign(new Var("d"), new Int(4)),
            new Assign(new Var("e"), new Int(5)),
        });

Where in the first one, going "right"
on the main node will advance you
through the program, but in the second
one simply following the next pointer
on each node will do the same.

It seems like the second one would be
more correct since you don't need
something like a special node type
with a potentially extremely long
array of pointers for the very first
node

I'd nearly always prefer the first approach, and I think you'll find it much easier to construct your AST when you don't need to maintain a pointer to the next node.

I think its generally easier to have all objects descend from a common base class, similar to this:

abstract class Expr { }

class Block : Expr
{
    Expr[] Statements { get; set; }
    public Block(Expr[] statements) { ... }
}

class Assign : Expr
{
    Var Variable { get; set; }
    Expr Expression { get; set; }
    public Assign(Var variable, Expr expression) { ... }
}

class Var : Expr
{
    string Name { get; set; }
    public Variable(string name) { ... }
}

class Int : Expr
{
    int Value { get; set; }
    public Int(int value) { ... }
}

Resulting AST is as follows:

Expr program =
    new Block(new Expr[]
        {
            new Assign(new Var("a"), new Int(1)),
            new Assign(new Var("b"), new Int(2)),
            new Assign(new Var("c"), new Int(3)),
            new Assign(new Var("d"), new Int(4)),
            new Assign(new Var("e"), new Int(5)),
        });
七颜 2024-11-17 07:17:27

这取决于语言。在 C 中,您必须使用第一种形式来捕获块的概念,因为块具有变量范围:

{
    {
        int a = 1;
    }
    // a doesn't exist here
}

变量范围将是您所谓的“主节点”的属性。

It depends on the language. In C, you'd have to use the first form to capture the notion of a block, since a block has a variable scope:

{
    {
        int a = 1;
    }
    // a doesn't exist here
}

The variable scope would be an attribute of what you call the "main node".

情话已封尘 2024-11-17 07:17:27

我相信你的第一个版本更有意义,原因有几个。

首先,第一个更清楚地展示了程序的“嵌套性”,并且也清楚地实现为有根树(这是树的通常概念)。

第二个也是更重要的原因是,您的“主节点”实际上可能是一个“分支节点”(例如),它可以只是更大的 AST 中的另一个节点。这样,您的 AST 就可以以递归的方式查看,其中每个 AST 都是一个节点,其他 AST 作为它的子节点。这使得第一个设计更加简单、更加通用并且非常同质。

I believe your first version make more sense, for a couple of reasons.

Firstly, the first more clearly demonstrates the "nestedness" of the program, and also is clearly implemented as a rooted tree (which is the usual concept of a tree).

The second, and more important reason, is that your "main node" could really have been a "branch node" (for example), which can simply be another node within a larger AST. This way, your AST can be viewed in a recursive sense, where each AST is a node with other ASTs as it children. This make the design of the first much simpler, more general, and very homogeneous.

娇妻 2024-11-17 07:17:27

建议:在处理树数据结构时,无论是与编译器相关的 AST 还是其他类型,始终使用单个“根”节点,它可以帮助您执行操作并拥有更多控制权:

class ASTTreeNode {
  bool isRoot() {...}

  string display() { ... }  
  // ...
}

void main ()
{
  ASTTreeNode MyRoot = new ASTTreeNode();

  // ...

  // prints the root node, plus each subnode recursively
  MyRoot.Show();
}

干杯。

Suggestion: When dealing with tree data structures, wheter is compiler-related AST or other kind, always use a single "root" node, it may help you perform operations and have more control:

class ASTTreeNode {
  bool isRoot() {...}

  string display() { ... }  
  // ...
}

void main ()
{
  ASTTreeNode MyRoot = new ASTTreeNode();

  // ...

  // prints the root node, plus each subnode recursively
  MyRoot.Show();
}

Cheers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文