在 C# 中实现编译器最有趣且最有前途的方法是什么?
我的毕业设计刚刚开始,预计持续 6 个月。 该项目的目标是为一种脚本语言实现 .Net 编译器。我将编译器构建作为我课程中的主题,并且了解如何实现编译器的基本步骤,但我们使用 Bison 和以 GCC 作为后端的简单编译器,因此我对实现编译器了解不多在.Net平台上。
在对这个主题进行了一些研究后,我发现了以下代码生成的替代解决方案(我不是在谈论编译器的其他基本部分,例如解析器 - 它超出了这里的范围):
- 使用 Reflection.Emit。
- 使用通用编译器接口通过反射进行抽象.Emit 用于自动化某些代码生成。
- 使用 CodeDOM 在运行时进行 C# 和 VB 编译。
- 有一种新出现的 C#“编译器即服务”,名为 Roslyn,现已作为 CTP 提供。
- DLR 提供对动态代码生成的支持,并具有一些用于运行时代码生成的接口通过表达式树等。Mono
- 附带 Mono.Cecil 库,该库似乎具有一些代码生成功能,例如出色地。
我的项目的主要目标是深入研究 .Net 的内部结构,学习编译器构造并为我的工作取得好成绩。第二个目标是提出一个编译器实现,以后可以在宽松的开源许可证下向社区开放。
那么,什么是最有趣、最有教育意义、最有娱乐性和最有前途的方法呢?如果我有更多时间,我肯定会尝试所有这些,但我需要在 6 个月内提交我的作品才能获得积极的成绩...
提前谢谢您, 亚历山大.
I am just in the beginning of my graduation project that is supposed to last for 6 months.
The goal of the project is to implement a .Net-compiler for one scripting language. I had the Compiler Construction as a subject in my curriculum and am aware of the basic steps how to implement a compiler in general, but we used Bison and simple compiler with GCC as back-end and thus I don't know much about implementing compilers on .Net platform.
Having carried out some research on this topic I found the following alternative solutions for code generation (I am not talking about other essential parts of compiler, like a parser -- it is out of scope here):
- Direct code generation using Reflection.Emit.
- Using Common Compiler Interface abstraction over Reflection.Emit for automation of some code generation.
- Using CodeDOM for C# and VB compilation at runtime.
- There is a new emerging C# "compiler as a service" called Roslyn, available as a CTP now.
- DLR offers support for dynamic code generation and has some interfaces for runtime code generation via expression trees etc.
- Mono is shipped with Mono.Cecil library that seems to have some functionality for code generation as well.
The primary goal of my project is to delve deeper into the guts of .Net, to learn Compiler Construction and to get good grade for my work. The secondary goal is to come up with a compiler implementation that can be later opened to the community under a permissive open-source license.
So, what would be a most interesting, educative, entertaining and promising approach here? I would have definitely tried all of them if I had some more time, but I need to submit my work in 6 months sharp to get a positive grade...
Thank you in advance,
Alexander.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您想要更简单的方法并且您的语言可以合理地转换为 C#,我建议您生成 C# 代码(或类似代码)并编译它。罗斯林可能是最擅长的。显然,CCI 也可以使用 CCI 代码 来做到这一点,但我从未使用过。我不会推荐 CodeDOM,因为它不支持静态类或扩展方法等功能。
如果您想要更多控制或者想要进入低级别,您可以使用 Reflection.Emit 直接生成 CIL。但这将是(更多)更多的工作,特别是如果您不熟悉 CIL。我认为 Cecil 可以以同样的方式使用,但它是用于其他用途,而且我认为它没有比 Reflection.Emit 提供任何优势。
顾名思义,DLR 指的是动态语言。它使用的
Expression
可用于代码生成,但我认为它们最擅长在运行时生成相对简单的方法。当然,如果您的语言是动态的,DLR 本身会非常有用。If you want the easier way and your language can be reasonably translated into C#, I would recommend you to generate C# code (or similar) and compile that. Roslyn would be probably best at that. Apparently, CCI can do that too using CCI Code, but I've never used that. I wouldn't recommend CodeDOM, because it doesn't support features like static classes or extension methods.
If you want more control or if you want to go low-level you can generate CIL directly using Reflection.Emit. But it will be (much) more work, especially if you're not familiar with CIL. I think Cecil can be used the same way, but it's intended for something else, and I don't think it offers any advantages over Reflection.Emit.
DLR is meant, as its full name suggests, for dynamic languages. The
Expression
s it uses can be used for code generation, but I think they are best at generating relatively simple methods at runtime. Of course, DLR itself can be very useful if your language is dynamic.Boo 是一种针对 CLI 的语言/编译器。它似乎是开源的,因此您可以研究他们是如何实现它的。
Boo is a language/compiler that targets the CLI. It appears to be open source so you could study how they accomplish it.
当我编写编译器时,我会编写汇编语言(即汇编语言源代码),然后通过系统的汇编程序运行。这样我就可以很容易地看到我正在生成什么。读取 mov ax, bx(x86 程序集)比解码十六进制操作码要容易得多。
如果我不允许在最终产品中使用汇编器,我会使用汇编输出开发编译器,然后一旦一切正常,我就创建了一个二进制输出路径。美妙之处在于,我所需要更改的只是实际的字节输出(操作码和二进制值而不是文本)。
我建议为您的项目做类似的事情。最初开发它以输出可以用 ILASM 组装的 MSIL。这样,您可以通过读取生成的代码轻松验证代码生成器的输出。一旦您确信代码生成器可以正常工作,请添加将使用
Reflection.Emit
或通用编译器基础结构的输出选项。Back when I was writing compilers, I would write to assembly language (i.e. assembly language source code) that I then ran through the system's assembler. That way I could easily see what I was generating. It's a whole lot easier to read
mov ax, bx
(x86 assembly) than it is to decode HEX opcodes.If I wasn't allowed to use the assembler in the final product, I developed the compiler using the assembly output and then once I got everything working I made a binary output path. The beauty was, all I had to change was the actual bytes output (opcodes and binary values rather than text).
I would suggest doing something similar for your project. Develop it initially to output MSIL that you can assemble with ILASM. That way, you can easily verify your code generator's output by reading the generated code. Once you're confident that your code generator is working, add an output option that will use
Reflection.Emit
or the Common Compiler Infrastructure.