我正在开发一个相当复杂的 DSL,我想将其编译成几种高级语言。整个过程是一次学习经历。编译器是用java编写的。
我想知道是否有人知道代码生成器部分设计的最佳实践。我目前已将所有内容解析为抽象语法树。
我正在考虑使用模板系统,但我还没有对这个方向进行太深入的研究,因为我想首先从堆栈溢出中听到一些智慧。
谢谢!
I'm working on a pretty complex DSL that I want to compile down into a few high level languages. The whole process has been a learning experience. The compiler is written in java.
I was wondering if anyone knew a best practice for the design of the code generator portion. I currently have everything parsed into an abstract syntax tree.
I was thinking of using a template system, but I haven't researched that direction too far yet as I would like to hear some wisdom first from stack overflow.
Thanks!
发布评论
评论(3)
当我在编程语言课上这样做时,我们最终使用了基于访问者模式<的发射器/a>.它工作得很好 - 只要你的 AST 与你打印的内容相当匹配,就可以很容易地将其重新定位到新的输出语言。
When I was doing this back in my programming languages class, we ended up using emitters based on following the visitor pattern. It worked pretty well - makes retargeting it to new output languages pretty easy, as long as your AST matches what you're printing fairly well.
你真正想要的是一个程序转换系统,它将语法结构映射为一种语言(你的DSL ) 转换成其他语言的语法模式。这样的工具可以在代码生成项目期间执行任意转换(树重写概括了字符串重写,这是完全图灵能力的 Post 系统),这意味着您生成的内容以及生成过程的复杂程度仅取决于您的野心,而不是通过“代码生成器框架”属性。
复杂的程序转换系统结合了各种类型的范围界定、流程分析和/或自定义分析器来实现转换。这并没有增加任何理论能力,但它增加了很多实际能力:大多数实际语言(甚至 DSL)都有命名空间、控制和数据流、需要类型推断等等。
我们的 DMS 软件再工程工具包就是这种类型的转换系统。它已被用于分析/转换传统语言和 DSL、简单语言和复杂语言、小型、大型甚至巨大的软件系统。
与OP关于“将 AST 转换为其他语言”,这是由 DMS 通过编写将 DSL 的表面语法(在 DSL 的 AST 幕后实现)映射到目标语言的表面语法的转换来完成的(使用目标语言 AST 实现)。然后,DMS 会自动对生成的目标语言 AST 进行漂亮打印,以提供与目标 AST 相对应的目标语言的实际源代码。
What you really want is a program transformation system, that maps syntax structures in one language (your DSL) into syntax patterns in other langauges. Such a tool can carry out arbitrary transformations (tree-rewrites generalize string-rewrites which are Post systems which are full Turing capable) during the code generation project, which means that what you generate and how sophisticated your generation process is determined only by your ambition, not by "code generator framework" properties.
Sophtisticated program transformation systems combine various types of scoping, flow analysis and/or custom analyzers to enable the tranformations. This doesn't add any theoretical power, but it adds a lot of practical power: most real languages (even DSLs) have namespaces, control and data flow, need type inference, etc. etc.
Our DMS Software Reengineering Toolkit is this type of transformation system. It has been used to analyze/transform both conventional languages and DSLs, for simple and complex languages, and for small, large and even huge software systems.
Related to comments by OP about "turning the AST into other languages", that is accomplished by DMS by writing transformations that map surface syntax for the DSL (implemented behind the scenes his DSL's AST) to surface syntax for the target language (implemented using target language ASTs). The resulting target langauge AST is then prettyprinted automatically by DMS to provide actual source code in the target language, that corresponds to the target AST.
如果您已经在使用 ANTLR 并准备好 AST,您可能需要看看 StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+文档
权威 ANTLR 参考:构建特定领域语言的第 9.6 节对此进行了解释:
http://www.pragprog.com/titles/tpantlr/the- Final-antlr-reference
免费代码示例位于 http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz。在子文件夹 code\templates\generator\2pass\ 中,您将找到一个将数学表达式转换为 Java 字节码的示例。
If you are already using ANTLR and have your AST ready you might want to take a look at StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation
Also Section 9.6 of The Definitive ANTLR Reference: Building Domain-Specific Languages explains this:
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
The free code samples are available at http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz. In the subfolder code\templates\generator\2pass\ you'll find an example converting mathematical expressions to java bytecode.