为了轻松转换,我的 AST 应该是什么样子?
我有一个类似于 javascript 的最小玩具语言。 我生成一个 AST 来尝试一些优化技术,例如转义分析、类型推断。 我尝试了一些方法,例如概括运算符标记而不是每个标记的类/函数,在每个节点上保留类型信息......但我仍然不觉得我要去任何地方。 它很快就会变得难以处理......
我研究了 lua5、neko、v8 但是......好吧......我确信不是周围最聪明的人之一。
是否有人有设计 AST 并在 AST 上实现转换的经验,这很容易工作? 我希望有一些提示和技巧可以让您的生活更轻松?
(请不要告诉我去看龙书。我已经有了。)
I have a minimal toy language similar to javascript. I generate an AST to try out some optimization techniques like escape analysis, type inference. I tried a few approaches like generalizing operator tokens instead of a class/function for each one, keeping type information on every node... But I still don't feel like I am going anywhere. It quickly becomes unwieldy to work on...
I studied lua5, neko, v8 but.. well... I am sure not one of the brightest one around.
Does anybody have experience designing AST and implementing transformations on a AST, which is easy to work on? I would appreciate tips and tricks that made life easier for you?
(Please don't tell me to go read dragon book. I have it already.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
正如艾伦提到的,阿佩尔的书很棒。 我在编译器本科课程中学习了机器学习中的现代编译器实现。
我个人会避免在 AST 上进行多次转换,因为您可以拥有多种不同的构造,并且可以表达同一事物的方式有多种。 您经常需要编写处理大量案例和子案例的代码,正如您所说,它很快就会变得笨拙。
最好将 AST 转换为更最小的表示形式,例如控制流图中的基本块。 然后,每个操作都可以编写为基本块中的简单语句。 可能的操作集应保持较小。 只需确保保留足够的信息,以便您仍然可以执行所需的所有转换(特别是不要丢弃类型)。 您还可以使用单一静态分配形式,其中每个程序变量仅分配一次。 这提供了一个不变量,简化了许多转换和分析。
As Alan mentioned, the Appel books are great. I had Modern Compiler Implementation in ML for an undergraduate course on compilers.
I would personally avoid doing many transformations on an AST simply because of the number of different constructs you can have and the number of ways the same thing can be expressed. You will frequently have to write code that handles a large number of cases and sub-cases, and as you said, it gets unwieldly very quickly.
It is better to transform the AST to a more minimal representation such as basic blocks in a control flow graph. Each operation can then be written as a simple statement in a basic block. The set of possible operations should be kept small. Just make sure to keep enough information that you can still do all the transformations you want (in particular, don't throw away types). You can also use Single Static Assignment form, where each program variable gets assigned only once. This provides an invariant which simplifies a lot of transformations and analyses.
我使用 ANLTR.org 我发现它非常简单。
I use ANLTR.org which I found very easy.
好吧,我不会推荐《龙》这本书,因为你已经有了它。 我可以推荐Appel书籍吗? 甚至还有一些源代码演示了这些概念,其中他们使用访问者模式将抽象语法树转换为具体语法树。 祝你好运,尽情享受!
Okay, I won't recommend the Dragon book, since you already have it. Can I recommend the Appel books? There is even some source code demonstrating the concepts, among them using the Visitor pattern for translating Abstract Syntax Trees to Concrete Syntax Trees. Good luck, and enjoy!
AST 代表程序的结构。 对于复杂的语言,
你的 AST 必然会很复杂。 所以不要假设这个
应该是“容易”。
许多人认为拥有 AST,生活会更轻松。
但解析只是喜马拉雅山麓。
AST 不代表常见的推论,
例如标识符的含义、接下来执行什么语句、
消耗这些数据的地方。
除非你具备所有这些,否则你不会
能够用真正的语言做很多事情,更不用说
轻松做到。
最好将这些推理结果缓存或显式显示:
符号表、控制流、数据流……
可以添加模式匹配语言来启用
识别语法结构,甚至编写转换
规则:
此类规则需要利用缓存的推论(例如,
“副作用”)。
尝试建立这一切确实很难。 一种制作方法
这种做法是为了分摊基础设施成本
许多语言和转换应用程序。
我们的 DMS 软件再工程工具包 拥有不同程度的所有这些机制,适用于各种领域语言,包括 C、C++、Java、C# 和 COBOL。
ASTs represent the structure of program. For complex languages,
your AST will necessarily be complicated. So don't assume this
should be "easy".
Many folks assume that having an AST, life will be easier.
But parsing is just the foothills of the Himalayas.
ASTs don't represent the common inferences,
such as the meaning of an identifier, what statement executes next,
where this data is consumed.
Unless you have all these available, you're not going to
be able to do much with a real language, let alone
do it easily.
It is best to make those inference results cached or explicit:
symbol tables, control flows, data flows, ...
One can add pattern matching langauges to enable the
recognition of syntax structures, or even to write transformation
rules:
Such rules need to draw on the cached inferences (eg.,
"side_effects").
Trying to build all this is really hard. One way to make
this practical is to amortize the infrastructure cost across
many lanuages and transformation applications.
Our DMS Software Reengineering Toolkit has all this machinery to varying degree for a wide variety of langauges, including C, C++, Java, C# and COBOL.