编译器设计-具有多个操作数的表达式的代码生成
我正在为类似 C 的语言编写编译器。
我已经完成了语法和语义检查,并且正在开始代码生成阶段。
我必须生成的最终代码必须采用 3 地址加载/存储架构。我可以假设一组无限制的寄存器以及用于堆栈和系统内存的 32 M 内存空间。
现在,当我开始生成代码时,我首先假设一个非常大的 int 数组 int R[2000000]
来表示寄存器集。当我遇到变量声明(通过解析器/语义分析器)时,我为该特定变量分配一个寄存器。
现在,在程序过程中,当我再次遇到这个变量时,我会从该寄存器编号中检索它。我将每个变量的寄存器号保存在符号表中。
我现在的问题是:假设,我们有这样的语句 -
a := b + c + e / f *h;
我分别在 R1,R2,R3,R4,R5,R6 中保存了 a,b,c,e,f,h,最终生成的代码将是(假设下一个可用寄存器从 R7 开始...)
R9 = R5 * R6;
R8 = R4 / R9;
R7 = R3 + R8;
R1 = R2 + R7;
“记住”前一个寄存器和操作是什么的方法是什么?
如果这不是正确的方法,任何人都可以给我一些关于如何实现它的指示吗?
任何建议都会很棒。谢谢
I am in the middle of coding up a compiler for a C like language.
I have gotten past syntax and semantic checking and I am beginning the code generation phase.
The final code that I have to generate has to be in 3 address load/store architecture. I'm allowed to assume an unbounded set of registers and a 32 M memory space for stack and system memory.
Right now, when I start generating the code, I begin by assuming a really large int array int R[2000000]
to denote the register set. And as and when I encounter a variable declaration (through the parser/semantic analyzer) I allocate a register for that particular variable.
Now in the course of the program when I encounter this variable again, I retrieve it back from that register number. I save the register number of every variable in the symbol table.
My question now is this: Suppose, we have a statement like this -
a := b + c + e / f *h;
And I have saved a,b,c,e,f,h in R1, R2, R3, R4, R5, R6 respectively, the final generated code would be (assuming that the next available regs start from R7...)
R9 = R5 * R6;
R8 = R4 / R9;
R7 = R3 + R8;
R1 = R2 + R7;
What is the approach to "remember" what was the previous register and the operation?
If this is not the correct way of doing it, can anyone please give me some pointers on how to implement it?
Any suggestion would be great. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
假设未绑定的寄存器集感觉就像“作弊”。在这种情况下,我想您可以继续前进并选择下一个可用的寄存器。
在现实情况下,寄存器分配通常使用“图形着色”来完成。在简单的情况下,图由表示变量(包括临时值)的节点和表示变量之间冲突的弧组成。目标是为每个变量分配一种颜色(代表寄存器编号),以便两个冲突的寄存器不会具有相同的颜色。 (从历史上看,这个问题类似于绘制世界各国地图的问题,因此使用术语“颜色”。)
如果算法无法找到合适的颜色,则必须“溢出”一些变量,例如将它们放在堆栈上。
寄存器分配可以在代码生成之前或之后完成,每种都会引起不同的有趣问题。
您可能需要考虑的其他事情是调用约定,其中函数的参数和返回值必须放置在特定的寄存器中。
Assuming an unbound register set feels like "cheating". In that case, I guess that you could go right ahead and pick the next available register, as you go along.
In a real-world situation, register assignment is typically done using "graph coloring". In the simple case, a graph consists of nodes representing variables (including temporary values) and arcs representing conflicts between variables. The goal is to assign each variable with a color (representing a register number) so that no two conflicting registers have the same color. (Historically, this problem is similar to the problem of drawing a map of the countries in the world, hence the use of the term "color".)
If the algorithm fail to find a suitable coloring, you have to "spill" some variables, e.g. place them on the stack.
Register allocation can be done either before or after code generation, each will give rise to different interesting problems.
Other things that you might need to consider is calling-convention, where the parameters and return values of a function must be placed in specific registers.
寄存器绑定应该是最后一步。
要为上面的表达式生成代码,您应该首先将其转换为逆波兰表示法,如下所示:
现在要生成代码,我们将执行此操作,每当遇到操作时,我们都会生成代码,同时维护中间结果堆栈。那么让我们开始吧。遍历它直到第一个操作:
所以我们应该添加两个变量并存储即时结果。在 x86 中,加法只能在寄存器之间或寄存器和内存之间进行加法。所以我们必须将
b
加载到寄存器中,并向其中添加c
:此时我们不知道要使用哪个寄存器,所以我将其标记为R1。
现在我们的堆栈是:
继续并添加接下来的几个元素,直到下一个操作。
在 x86 中,除法很棘手,被除数存储在 EDX:EAX 64 位对中,商将存储在 EAX 中。假设您正在使用带符号的 32 位整数,我们必须将
e
加载到 EAX 中并将其符号扩展为 EDX,然后执行除法。所以需要生成这段代码:幸运的是 EDX 和 EAX 都是免费的,所以我们可以使用它。如果它们已经被使用,我们需要将它们的值保存在另一个寄存器中以释放它们,我们现在不需要这样做。
结果保存在 EAX 中。所以我们的堆栈现在:
继续:
第一个操作数是一个寄存器,所以我们不需要加载任何东西,只需立即执行乘法:
结果在 EAX 中,所以现在堆栈:
继续:
再次寄存器到寄存器操作,只需一条指令即可完成:
结果在 R1 中。现在的堆栈:
最后是赋值:
所以几乎最终的代码将是:
现在是时候替换 R1 了。可以是哪个寄存器?
主要规则是寄存器的生命周期不得重叠。
寄存器的生命周期从第一次完全覆盖(如 MOV)开始,到下一次完全覆盖(如果有)之前的最后一次使用结束。
在我们的例子中,R1 的生命周期与 EAX 和 EDX 重叠。所以我们不能将它们用于 R1,但 EBX 未使用,因此我们可以使用它:
此后所有寄存器都是空闲的。
这将是您的表达式的代码。可以看出,每个变量只加载一次。因此,在这个特定示例中,将它们存储在寄存器中并没有多大好处。但是,如果您的函数有多个语句(当然会这样做),并且在生成整个函数的代码后仍然可以找到空闲变量,则应该将最常用的变量加载到寄存器中。
结论是,你应该为每种可能的情况制定一个计划,在这里我们看到了变量到变量的情况,但有时你有像 4*a 这样的雄蕊,你可以在其中加载 a,即左移而不是乘法。
Register binding should be the very final step.
To generate code for your expression above you should start with turning it to a reverse Polish notation, which looks like this:
Now to generate the code we go through this and whenever we encounter an operation, we generate code while maintaining a stack of intermediate results. So let's start. Go through it until the first operation:
So we are supposed to add two variables and store the immediate result. In x86 addition can add only between registers or register and memory. So we must load
b
into a register and addc
to it:At this point we don't know which register to use, so I marked it with R1.
Now our stack is:
Continue and add the next few elements until the next operation.
In x86 division is tricky, the dividend is stored in the EDX:EAX 64 bit pair, the quotient will be in EAX. Assuming you are working with signed 32 bit integers, we must load
e
into EAX and sign extend it to EDX, and the perform the division. So this code needs to be generated:Fortunately both EDX and EAX is free so we can use it. If they were already used, we would need to save their value in another register to free them up, we don't need to do that now.
The result is in EAX. So our stack now:
Keep going:
The first operand is a register, so we don't need to load anything, just do the multiplication right away:
The result is in EAX so the stack now:
Continue:
Again a register to register operation, just a single instruction will do it:
The result is in R1. The stack now:
And finally the assignment:
So the almost final code will be:
Now it's time to substitute R1. Which register can it be?
The main rule is that register lifetimes must not overlap.
The lifetime of a register that start with a first complete overwrite (like MOV) and ends with the last use before the next complete overwrite (if any).
In our case R1's lifetime overlap with EAX and EDX. So we cannot use them for R1, but EBX is unused so we can use it:
And after this all registers are free.
And this will be code for your expression. It can be seen that each variable is loaded only once. So there is not much benefit to store them in registers in this particular example. But if your function has multiple statements (and of course will do), the most frequently used variables should be loaded into registers if you can still find a free one after the code for the entire function is generated.
The takeaway is that you should have a plan for each possible scenario, here we saw variable to variable scenario, but sometimes you have stamenents like 4*a, where you can load the a, the shift left instead of multiplication.
我建议您可以简单地枚举操作并将其编号用作保存结果值的寄存器的编号。
如果使用树表示,则执行操作的寄存器数是子节点数。
另外,您可以使用堆栈机来表示中间代码。
I suggest you may simply enumerate operations and use it number as number of register which hold result value.
If you use tree representation, then register numbers on which operation performed is numbers of child nodes.
Also, you can use stack machine for intermediate code representation.
这种表达通常用树来表示。每个节点对应一个操作,每个节点包含保存结果的寄存器的索引。
Such expression are often represented with a tree. Each node correspond to an operation, and each node contains the index of the register that holds the result.