AST 的树结构
我正在尝试用 C# 编写一个解释器,并且正处于解析阶段。我发现此时我必须生成抽象语法树,但我不知道如何在 C# 中表示它。
目前我只是使用 List
多谢。
I am trying to write an interpreter in C# and I am on a parsing stage. I figured out I have to generate Abstract Syntax Tree at this point, but I don't know how to represent it in C#.
Currently I am just using List<object>
, but I have a feeling that I am doing it wrong.
Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有多种技术,从单个“节点”类型(带有“类型”标签的一大堆字段)到特定类的细粒度层次结构。重要的是要考虑如何使遍历代码简单且健壮,因为您将需要一遍又一遍地遍历这个数据结构。
然而,退一步来说,解释并不严格要求 AST。许多早期的解释器会在遇到每一行代码时逐字地读取它,通过基于堆栈的书签系统实现,通过循环等动态地解析和执行它。我怀疑像 bash 和 cmd.exe 这样的 shell 语言今天仍然像这样工作。
There are numerous techniques, ranging from a single "node" type — a big bucket of fields with a "type" tag — to a fine-grained hierarchy of specific classes. The important thing is to think about how to make traversal code simple and robust, because you will need to traverse this data structure over and over again.
Taking a step back, however, interpretation doesn't strictly require an AST. Many earlier interpreters would quite literally read each line of code as they encountered it, parsing and executing it on the fly, with loops, etc., implemented via a stack-based book-marking system. I suspect shell languages like bash and cmd.exe still work like this today.
我建议您继续使用
List
将您的输入直接解析为 S 表达式,并让您的解释器直接使用它。毕竟,LISP 在 AST 之前就已经存在了!
I suggest that you continue using
List<object>
until you understand clearly what limitations does this impose and what your requirements are. Or, since you are writing a LISP interpreter, create a pair class and use that withobject
,null
being the equivalent of'()
:Parse your input directly into an S-expression and have your interpreter work directly with that. After all, LISP existed before ASTs!
大多数 AST 节点实现都非常简单。
它们是一个结构体(好的,好的,“类”),包含节点类型(通常是整数),子级列表(列表是好的;高性能实现有一组统计上常见的第一个,第二个,第三个子级的成员) ,以及一些附加字段来携带特定于 AST 节点实例的值(例如,AST 节点“整数常量”的值“5”)。为了实现树从任何节点返回到父节点的有效导航,通常有一个返回父节点的特殊引用。
更困难的是决定您应该拥有什么 AST 节点集。对于大型语法来说,这是不方便的,因为您必须定义数百个语法的集合,并且当您尝试修改语法以使其正确时,将会出现混乱。
一个简单的技巧就是为每个语法规则定义一个 AST 节点。 (这被大多数人称为“具体语法树”)。但它是无脑的,你不会错过任何东西。
我们的 DMS 软件重新工程工具包遵循这个“简单技巧”的想法,生成 AST 节点直接从语法规则中输入。它还进行了优化:树中不存在不携带值的叶 AST 节点,树中不存在一元产生式的节点,并且列表形成产生式具有子级的 List 样式,而其他节点类型具有儿童的固定插槽类型。无论如何,最终结果是非常接近“抽象”语法树。所有这些都是由 DMS 的解析器生成器自动构建的,因此您根本不需要思考。
DMS 还拥有一个完整的、经过良好测试的 C# 4.0 前端。一旦克服了定义 AST 的麻烦,您就会想要对其进行分析/转换/生成,而 DMS 的其余部分将突然变得明显有价值。
Most AST node implementations are pretty simple.
They are a struct (ok, ok, "class") containing a node type (usually an integer), a list of children (List is OK; high performance implementations have a set of members for statistically common 1st, 2nd, 3rd children), and some additional fields to carry values specific to the AST node instance, (e.g., the value "5" for the AST node "integer constant"). To enable efficient navigation of the tree from any node back to parents, there is often a special reference back to the parent node.
What's harder is deciding what set of AST nodes you should have. For a large grammar, this is inconvenient as you'll have to define a set of several hundred of them, and there will be churn as you modify the grammar in your attempts to get it right.
A simple trick is simply define one AST node per grammar rule. (This would be called a "concrete syntax tree" by most). But it is brainless and you don't miss anything.
Our DMS Software Reengineering Toolkit follows this "simple trick" idea, generating the AST node types dirrectly from the grammar rules. It additionally optimizes: leaf AST nodes that don't carry values aren't present in the tree, nodes for unary productions aren't present in the tree, and list-forming productions have the List style of children, while other node types have the fixed slot types for children. The net result is you what is pretty close to a "abstract" syntax tree anyway. All of this is automatically constructed by DMS's parser generator so you don't have think at all.
DMS also has a full, well-test C# 4.0 front end. Once you get past the headache of defining the AST, you'll then want to analyze/transform/generate from it, and the rest of DMS will suddenly become obviously valuable.