为编译器设计中间表示
我一直在研究编译器设计。我在大学完成了一个学期的课程,并且一直在阅读 Grune 等人的《现代编译器设计》,这本书似乎提倡使用带注释的抽象语法树作为中间代码,这就是我们在课程中使用过。
我的问题是,与生成某种堆栈机器语言或低级伪代码相比,这种方法有什么好处,特别是在拥有可以针对许多机器的编译器方面。
简单地针对已经存在的低级表示(例如 LLVM)并将其用作中间表示是一个好主意吗?
I've been looking at compiler design. I've done a one semester course on it at University and have been reading Modern Compiler Design by Grune et al, the book seems to advocate an annotated Abstract Syntax Tree as the intermediate code, and this is what we used in the course.
My question is what are the benefits of this approach versus producing some kind of stack-machine language or low level pseudo code , particularly with regard to having a compiler which can target many machines.
Is it a good idea to simply target an already existing low level representation such as LLVM and use that as the intermediate representation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您的语言足够复杂,那么无论如何您最终都会得到一系列略有不同的中间表示。哪种表示形式将成为您的最终目标并不重要 - llvm、C、本机代码、CLR、JVM 等等。它不应该影响编译器的设计和体系结构。
而且,根据我的个人经验,中间步骤越多,中间的转换尽可能简单,编译器的体系结构就越好。
If your language is complicated enough, you'd end up having a sequence of slightly different intermediate representations any way. And it does not really matter, which representation will be your final target - llvm, C, native code, CLR, JVM, whatever. It should not affect the design and architecture of your compiler.
And, from my personal experience, the more intermediate steps you have, with transforms in between as trivial as possible, the better your compiler's architecture is.
AST 和低级伪代码是编译器从高级语言到目标代码的过程中程序的两种不同抽象。
与任何完整的数据表示一样,您可以使用任一表示执行所需的所有操作。有些事情用其中一种比另一种更容易做。
例如,对 AST 进行语义和语法分析会更容易。在伪代码上进行指令调度更容易。
编译器前端开发人员往往喜欢 AST。后端开发人员往往喜欢伪代码。
An AST and low-level pseudo-code are two different abstractions of a program in the journey a compiler takes from a high-level language to object code.
As with any complete data representation, you can do everything you need to with either representation. Some things are just easier to do with one than the other.
For example, it's easier to do semantic and syntax analysis on an AST. It's easier to do instruction scheduling on pseudo-code.
Compiler front-ends developers tend to like ASTs. Back end developers tend to like pseudo code.
我在编译器的讨论中没有听说过带注释的语法树,所以我将使用相同的习惯用法 AST(抽象语法树)。
通常,您可以让解析器创建一个 AST,等等,它是代码的抽象表示。它不包含任何空格或语义风格,例如方括号、括号等。它还解决了代码中的任何歧义。
AST 将使从中生成 icode 变得非常容易。该 icode 基本上是您语言中的指令代码。它将包含基本操作,如 move、goto 等。
该过程将进行 Code -> 。 AST->代码。然后可以通过虚拟机运行 ICode。
我不认为生成针对另一个平台的 ICode 有什么问题。
更新
我再次重读了这个问题,我明白了现在正在讨论的内容。他的意思是,不要创建 icode 表示,而是将叶子留在带注释的语法树上。不过我很好奇,如果您构建了自己的机器来处理带注释的语法树,或者该树是否被转换为另一个众所周知的中间代码?
我想处理语法树的引擎设计会比代表 mov、goto 等基础知识的中间格式更复杂。
我需要拿起这本书。我从 Dragon 书中学到了一切,并通过 ANTRL、yacc、byson 以及自定义标记器和解析器进行搜索。
I haven't heard of an annotated syntax tree in the discussion of compilers so I'm going to go with the same idiom AST (Abstract Syntax Tree).
Normally you can have your parser create an AST which will be, wait for it, an abstract representation of your code. It doesn't contain any spacing, or semantic flavor such as brackets, parens, etc. It also resolves any ambiguity in your code.
An AST will make it very easy to produce icode from it. This icode is basically the instruction code in your language. It will contain rudimentary operations like move, goto, etc.
The process would go Code -> AST -> ICode . The ICode could then be ran through a VM.
I don't see anything wrong with producing ICode that is targeted at another platform.
Update
I reread the question again and I understand what is being talked about now. He is saying instead of creating an icode representation leave leaves at a annotated syntax tree. I'm curious though, if you built your own machine that would process the annotated syntax tree, or was that tree then converted into another well know intermediate code?
I would imagine the engine design for processing a syntax tree would be more complicated than if it was in a intermediate format that represented the basics such as mov, goto, etc.
I'll need to pick this book up. I learned everything from the dragon book and searching through ANTRL, yacc, byson and custom tokenizers and parsers.