在编译器中生成中间代码。处理条件时是否总是需要 AST 或解析树?
我正在参加一个编译器设计课程,我们必须实现我们自己的编译器(使用 flex 和 bison)。我有解析方面的经验(编写 EBNF 和递归下降解析器),但这是我第一次编写编译器。
语言设计是相当开放的(教授把它留给了我们)。在课堂上,教授回顾了生成中间代码的过程。他说我们在解析时不需要构造抽象语法树或者解析树,我们可以直接生成中间代码。
我发现这令人困惑有两个原因:
如果您在定义函数之前调用函数怎么办?如何解决分支目标?我想你必须制定一条规则,即在使用函数之前必须定义它们,或者可能预先定义它们(就像 C 那样?)
你将如何处理条件?如果您有一个
if-else
甚至只是一个if
,当条件为时,如何解析
(如果您正在生成代码)?if
的分支目标>false
我计划生成一个 AST,然后在创建它后遍历树,以解析函数和分支目标的地址。这是正确的还是我错过了什么?
I'm taking a compiler-design class where we have to implement our own compiler (using flex and bison). I have had experience in parsing (writing EBNF's and recursive-descent parsers), but this is my first time writing a compiler.
The language design is pretty open-ended (the professor has left it up to us). In class, the professor went over generating intermediate code. He said that it is not necessary for us to construct an Abstract Syntax Tree or a parse tree while parsing, and that we can generate the intermediate code as we go.
I found this confusing for two reasons:
What if you are calling a function before it is defined? How can you resolve the branch target? I guess you would have to make it a rule that you have to define functions before you use them, or maybe pre-define them (like C does?)
How would you deal with conditionals? If you have an
if-else
or even just anif
, how can you resolve the branch target for theif
when the condition isfalse
(if you're generating code as you go)?
I planned on generating an AST and then walking the tree after I create it, to resolve the addresses of functions and branch targets. Is this correct or am I missing something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
解决这两个问题的一般方法是保留需要“修补”的地址列表。您生成代码并为丢失的地址或偏移量留下漏洞。在编译单元的末尾,您将检查漏洞列表并填充它们。
在 FORTH 中,补丁“列表”保留在控制堆栈上,并在每个控制结构终止时展开。请参阅 FORTH Dimensions
轶事:早期的 Lisp 编译器(我相信是 Lisp)生成了一个列表符号格式的机器代码指令,以及对条件的每个分支的机器代码列表的前向引用。然后它生成向后遍历列表的二进制代码。这样,当需要发出分支指令时,就知道所有前向分支的代码位置。
The general solution to both of your issues is to keep a list of addresses that need to be "patched." You generate the code and leave holes for the missing addresses or offsets. At the end of the compilation unit, you go through the list of holes and fill them in.
In FORTH the "list" of patches is kept on the control stack and is unwound as each control structure terminates. See FORTH Dimensions
Anecdote: an early Lisp compiler (I believe it was Lisp) generated a list of machine code instructions in symbolic format with forward references to the list of machine code for each branch of a conditional. Then it generated the binary code walking the list backwards. This way the code location for all forward branches was known when the branch instruction needed to be emitted.
Crenshaw 教程是不使用任何类型 AST 的具体示例。它构建了一个工作编译器(显然包括条件),可以立即生成针对 m68k 汇编的代码。
你可以花一个下午的时间读完这个文档,这是值得的。
The Crenshaw tutorial is a concrete example of not using an AST of any kind. It builds a working compiler (including conditionals, obviously) with immediate code generation targeting m68k assembly.
You can read through the document in an afternoon, and it is worth it.