AST 遍历是在访问者中还是在节点中?

发布于 2024-10-19 00:59:41 字数 3387 浏览 8 评论 0原文

更新接受了Ira Baxter的回答,因为它为我指明了正确的方向:我首先通过开始编译阶段的实现来弄清楚我实际需要什么,并且很快就很明显,节点内的遍历使得这是一个不可能的方法。并非所有节点都应该被访问,其中一些节点的顺序相反(例如,首先访问赋值的 rhs,以便编译器可以检查类型是否与 rhs/operator 匹配)。将遍历放入访问者中使这一切变得非常容易。

在决定对应用程序中使用的迷你语言的处理进行主要重构之前,我正在尝试 AST 等。 我已经构建了一个 Lexer/Parser,并且可以很好地获取 AST。还有一个 Visitor,作为具体实现,我制作了一个 ASTToOriginal,它只是重新创建原始源文件。最终将会有某种编译器也实现 Vsisitor 并在运行时创建实际的 C++ 代码,所以我想确保一切从一开始就是正确的。 虽然现在一切正常,但由于遍历顺序是在访问者本身中实现的,因此存在一些类似/重复的代码。

当查找更多信息时,似乎某些实现更喜欢在访问的对象本身中保留遍历顺序,以免在每个具体访问者中重复此操作。 即使 GoF 也只是简单地谈论这一点,以同样的方式。所以我也想尝试这种方法,但很快就陷入困境......让我解释一下。

示例源代码行和相应的 AST 节点:

if(t>100?x=1;sety(20,true):x=2)
Conditional
  BinaryOp
    left=Variable [name=t], operator=[>], right=Integer [value=100]
  IfTrue
    Assignment
      left=Variable [name=x], operator=[=], right=Integer [value=1] 
    Method
      MethodName [name=sety], Arguments( Integer [value=20], Boolean [value=true] )
  IfFalse
    Assignment
      left=Variable [name=x], operator=[=], right=Integer [value=1]

一些代码:

class BinaryOp {
  void Accept( Visitor* v ){ v->Visit( this ); }
  Expr* left;
  Op* op;
  Expr* right;
};    
class Variable {
  void Accept( Visitor* v ){ v->Visit( this ); }
  Name* name;
};
class Visitor { //provide basic traversal, terminal visitors are abstract
  void Visit( Variable* ) = 0;
  void Visit( BinaryOp* p ) {
    p->left->Accept( this );
    p->op->Accept( this );
    p->right->Accept( this );        
  }
  void Visit( Conditional* p ) {
    p->cond->Accept( this );
    VisitList( p->ifTrue ); //VisitList just iterates over the array, calling Accept on each element
    VisitList( p->ifFalse );
  }
};

实现 ASTToOriginal 非常简单:所有抽象 Visitor 方法都只是打印出终端的名称或值成员。 对于非终结符,这取决于;打印分配可以在默认的访问者遍历中正常工作,因为需要条件额外的代码:

class ASTToOriginal {
  void Visit( Conditional* p ) {
    str << "if(";
    p->cond->Accept( this );
    str << "?";
      //VisitListWithPostOp is like VisitList but calls op for each *except the last* iteration
    VisitListWithPostOp( p->ifTrue, AppendText( str, ";" ) );
    VisitListWithPostOp( p->ifFalse, AppendText( str, ";" ) );
    str << ")";
  }
};

因此可以看到访问者和 ASTToOriginal 中条件的访问方法确实非常相似。 然而,试图通过遍历节点来解决这个问题不仅使事情变得更糟,而且变得一团糟。 我尝试了一种使用 PreVisit 和 PostVisit 方法的方法,它解决了一些问题,但只是在节点中引入了越来越多的代码。 它也开始看起来像我必须跟踪访问者内部的许多状态才能知道何时添加右括号等。

class BinaryOp {
  void Accept( Conditional* v ) { 
    v->Visit( this );
    op->Accept( v )
    VisitList( ifTrue, v );
    VisitList( ifFalse, v );
};
class Vistor {
  //now all methods are pure virtual
};
class ASTToOriginal {
  void Visit( Conditional* p ) {
    str << "if(";
    //now what??? after returning here, BinaryOp will visit the op automatically so I can't insert the "?"
    //If I make a PostVisit( BinaryOp* ), and call it it BinaryOp::Accept, I get the chance to insert the "?",
    //but now I have to keep a state: my PostVisit method needs to know it's currently being called as part of a Conditional
    //Things are even worse for the ifTrue/ifFalse statement arrays: each element needs a ";" appended, but not the last one,
    //how am I ever going to do that in a clean way?
  }
};

问题:这种方法是否不适合我的情况,或者我忽略了一些重要的事情?有没有一个通用的设计来解决这些问题?如果我还需要不同方向的遍历怎么办?

Update accepted Ira Baxter's answer since it pointed me into the right direction: I first figured out what I actually needed by starting the implementation of the compiling stage, and it became obvious pretty soon that traversal within the nodes made thie an impossible approach. Not all nodes should be visited, and some of them in reverse order (for example, first the rhs of an assignment so the compiler can check if the type matches with the rhs/operator). Putting traversal in the visitor makes this all very easy.

I'm playing around with ASTs and the likes before deciding a major refactory of the handling of a mini-language used in an applicaiton.
I've built a Lexer/Parser and can get the AST just fine. There's also a Visitor and as concrete implementation I made an ASTToOriginal which just recreates the original source file. Eventually there's goin to be some sort of compiler that also implements the Vsisitor and creates the actual C++ code at runtime so I want to make sure everything is right from the start.
While everything works fine now, there is some similar/duplicate code since the traversal order is implemented in the Visitor itself.

When looking up more information, it seems that some implementations prefer keeping the traversal order in the visited objects themselves instead, in order not to repeat this in each concrete visitor.
Even the GoF only talks briefly about this, in the same way. So I wanted to give this approach a try as well but got stuck pretty soon.. Let me explain.

Sample source line and corresponding AST nodes:

if(t>100?x=1;sety(20,true):x=2)
Conditional
  BinaryOp
    left=Variable [name=t], operator=[>], right=Integer [value=100]
  IfTrue
    Assignment
      left=Variable [name=x], operator=[=], right=Integer [value=1] 
    Method
      MethodName [name=sety], Arguments( Integer [value=20], Boolean [value=true] )
  IfFalse
    Assignment
      left=Variable [name=x], operator=[=], right=Integer [value=1]

Some code:

class BinaryOp {
  void Accept( Visitor* v ){ v->Visit( this ); }
  Expr* left;
  Op* op;
  Expr* right;
};    
class Variable {
  void Accept( Visitor* v ){ v->Visit( this ); }
  Name* name;
};
class Visitor { //provide basic traversal, terminal visitors are abstract
  void Visit( Variable* ) = 0;
  void Visit( BinaryOp* p ) {
    p->left->Accept( this );
    p->op->Accept( this );
    p->right->Accept( this );        
  }
  void Visit( Conditional* p ) {
    p->cond->Accept( this );
    VisitList( p->ifTrue ); //VisitList just iterates over the array, calling Accept on each element
    VisitList( p->ifFalse );
  }
};

Implementing ASTToOriginal is pretty straightforward: all abstract Visitor methods just print out the name or value member of the terminal.
For the non-terminals it depends; printing an Assignment works ok with the default Visitor traversal, for a Conditional extra code is needed:

class ASTToOriginal {
  void Visit( Conditional* p ) {
    str << "if(";
    p->cond->Accept( this );
    str << "?";
      //VisitListWithPostOp is like VisitList but calls op for each *except the last* iteration
    VisitListWithPostOp( p->ifTrue, AppendText( str, ";" ) );
    VisitListWithPostOp( p->ifFalse, AppendText( str, ";" ) );
    str << ")";
  }
};

So as one can see both the Visit methods for a Conditional in Visitor and ASTToOriginal are indeed very similar.
However trying to solve this by putting traversal into the nodes made things not just worse, but rather a complete mess.
I tried an approach with PreVisit and PostVisit methods which solved some problems, but just introduced more and more code into the Nodes.
It also started to look like I would have to keep track of a number of states inside the Visitor to be able to know when to add closing brackets etc.

class BinaryOp {
  void Accept( Conditional* v ) { 
    v->Visit( this );
    op->Accept( v )
    VisitList( ifTrue, v );
    VisitList( ifFalse, v );
};
class Vistor {
  //now all methods are pure virtual
};
class ASTToOriginal {
  void Visit( Conditional* p ) {
    str << "if(";
    //now what??? after returning here, BinaryOp will visit the op automatically so I can't insert the "?"
    //If I make a PostVisit( BinaryOp* ), and call it it BinaryOp::Accept, I get the chance to insert the "?",
    //but now I have to keep a state: my PostVisit method needs to know it's currently being called as part of a Conditional
    //Things are even worse for the ifTrue/ifFalse statement arrays: each element needs a ";" appended, but not the last one,
    //how am I ever going to do that in a clean way?
  }
};

Question: is this approach just not suited for my case, or am I overlooking something essential? Is there a common design to cope with these problems? What if I also need traversal in a different direction?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

极度宠爱 2024-10-26 00:59:41

这里有两个问题:

  • 哪些子节点可以访问?
  • 您应该以什么顺序拜访他们?

可以说,实际的子节点应该通过节点类型来了解;实际上,它应该被语法所了解,并从语法“反映”到一般访问者中。

访问节点的顺序完全取决于您需要执行的操作。如果您正在进行漂亮打印,则从左到右的子顺序是有意义的(如果子节点按语法顺序列出,但它们可能不是)。如果您正在构建符号表,那么您肯定希望在访问语句主体子级之前访问声明子级。

此外,您还需要担心哪些信息在树上向上或向下流动。变量访问列表将沿树向上流动。构造的符号表从声明沿树向上流动,并向下返回到语句体子级。而这个信息流强制访问顺序;为了将符号表向下传递到语句主体中,首先必须构造一个符号表并从声明子级向上传递。

我认为这些问题会让你感到悲伤。您试图对访问者强加单一结构,而实际上访问顺序完全依赖于任务,并且您可能对树执行许多不同的任务,每个任务都有自己的信息流,因此具有顺序依赖性。

解决这个问题的方法之一是使用 attribute(d) 语法 (AG) 的概念,其中用各种类型的属性以及它们的计算/使用方式来装饰语法规则。您实际上将计算编写为语法规则的注释,例如:

  method = declarations statements ;
  <<ResolveSymbols>>: { declarations.parentsymbols=method.symboltable;
                        statements.symboltable = declarations.symboltable;
                      }

语法规则告诉您必须具有哪些节点类型。属性计算告诉您什么值正在沿着树向下传递(对 method.symboltable 的引用是来自父级的某些值),向上传递给树(对 statements.symbol 表的引用是由该子级计算的某些值),或者跨树传递什么值。树(statements.symboltable 从由 statements.symboltable 计算的值向下传递到 statements 子级)。属性计算定义访问者。所执行的计算称为“属性评估”。

此特定属性语法的表示法是我们 DMS 软件重新工程工具包 的一部分。其他 AG 工具使用类似的符号。与所有(AG)方案一样,特定规则用于为特定节点(“方法”)制造特定目的(“ResolveSymbols”)访问者。通过每个节点的一组此类规范,您将获得一组可以执行的特定目的访问者。
AG 方案的价值在于您可以轻松编写此方案,并且会生成所有样板文件。

您可以像这样抽象地思考您的问题,然后像您一直在做的那样,简单地手动生成特定目的的访问者。

There are two problems:

  • What children nodes are possible to visit?
  • What order should you visit them?

Arguably the actual children nodes should be known by node type; actually, it should be known by the grammar, and "reflected" from the grammar into the general visitor.

The order in which the nodes are visited depend completely on what you need to do. If you are doing prettyprinting, a left-to-right child order makes sense (if the children nodes are listed in the order of the grammar, which they might not be). If you are constructing symbol tables, you surely want to visit the declaration children before you visit the statement body child.

In addition, you need to worry about what information flows up or down the tree. A list-of-variable accesses would flow up the tree. A constructed symbol table flows up the tree from the declarations, and back down into statement body child. And this information flow forces the visit order; to have a symbol table to pass down into the statement body, you first have to have a symbol table constructed and passed up from the declarations child.

I think these issues are the ones that are giving you greif. You are trying to impose a single structure on your visitors, when in fact the visit order is completely task dependent, and there are lots of different tasks you might do with a tree, each with its own information flow and thus order dependency.

One of the ways this is solved is with the notion of an attribute(d) grammar (AG), in which one decorates the grammar rules with various types of attributes and how they are computed/used. You literally write a computation as an annotation to the grammar rule, for instance:

  method = declarations statements ;
  <<ResolveSymbols>>: { declarations.parentsymbols=method.symboltable;
                        statements.symboltable = declarations.symboltable;
                      }

The grammar rule tells you what node types you have to have. The atrribute computation tells you what value are being passed down the tree (reference to method.symboltable is to something coming form the parent), up the tree (reference to declarations.symbol table is to something computed by that child), or across the tree (statements.symboltable is passed down to the statements child, from the value computed by declarations.symboltable). The attribute computation defines the visitor. The executed computation is a called "attribute evaluation".

This notation for this particular attribute grammar is part of our DMS Software Reengineering Toolkit. Other AG tools use similar notations. Like all (AG) schemes, the particular rule is used to manufacture the purpose-specific ("ResolveSymbols") visitor for the specific node ("method"). With a set of such specifications for each node, you get a set of purpose-specific visitors that can be executed.
The value of an AG scheme is that you can write this easily and all the boilerplate gunk gets generated.

You can think about your problem in the abstract like this, and then simply generate the purpose-specific visitors by hand, as you have been doing.

小…红帽 2024-10-26 00:59:41

对于树的递归通用遍历,访问者和复合体通常一起使用,例如(第一个相关的谷歌链接)那里。我第一次读到这个想法是在那里。还有访问者组合器一个好主意。

顺便说一句......

这就是函数式语言的闪光点,它具有代数数据类型和模式匹配。如果可以的话,切换到函数式语言。 Composite 和 Visitor 只是丑陋的解决方法,因为分别缺乏对 ADT 和模式匹配的语言支持。

For recursive generic traversing of trees, Visitor and Composite are usually used together, like in (first relevant google link) there. I first read about this idea there. There are also visitor combinators which are a nice idea.

And by the way...

this is where functional languages shine, with their Algebraic Data Types and pattern matching. If you can, switch to a functional language. Composite and Visitor are only ugly workarounds for lack of language support for respectively ADT and pattern matching.

彩扇题诗 2024-10-26 00:59:41

IMO,我会让每个具体类(例如 BinaryOp、Variable)扩展 Visitor 类。这样,创建 BinaryOp 对象所需的所有逻辑都将驻留在 BinaryOp 类中。此方法类似于 Walkabout 模式。它可能会让你的任务变得更容易。

IMO, I would have each concrete class (e.g. BinaryOp, Variable) extend the Visitor class. That way, all the logic necessary to create a BinaryOp object, would reside in the BinaryOp class. This approach is similar to the Walkabout pattern. It may make your task easier.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文