当前位置：文江博客话题详情

c compiler-construction code-generation cpu-registers

编译器设计-具有多个操作数的表达式的代码生成

发布于 2024-10-19 05:19:09 字数 649 浏览 2 评论 0原文

我正在为类似 C 的语言编写编译器。

我已经完成了语法和语义检查，并且正在开始代码生成阶段。

我必须生成的最终代码必须采用 3 地址加载/存储架构。我可以假设一组无限制的寄存器以及用于堆栈和系统内存的 32 M 内存空间。

现在，当我开始生成代码时，我首先假设一个非常大的 int 数组 int R[2000000] 来表示寄存器集。当我遇到变量声明（通过解析器/语义分析器）时，我为该特定变量分配一个寄存器。

现在，在程序过程中，当我再次遇到这个变量时，我会从该寄存器编号中检索它。我将每个变量的寄存器号保存在符号表中。

我现在的问题是：假设，我们有这样的语句 -

a := b + c + e / f *h;

我分别在 R1，R2，R3，R4，R5，R6 中保存了 a，b，c，e，f，h，最终生成的代码将是（假设下一个可用寄存器从 R7 开始...）

R9 = R5 * R6;
R8 = R4 / R9;
R7 = R3 + R8;
R1 = R2 + R7;

“记住”前一个寄存器和操作是什么的方法是什么？

如果这不是正确的方法，任何人都可以给我一些关于如何实现它的指示吗？

任何建议都会很棒。谢谢

原文

I am in the middle of coding up a compiler for a C like language.

I have gotten past syntax and semantic checking and I am beginning the code generation phase.

The final code that I have to generate has to be in 3 address load/store architecture. I'm allowed to assume an unbounded set of registers and a 32 M memory space for stack and system memory.

Right now, when I start generating the code, I begin by assuming a really large int array int R[2000000] to denote the register set. And as and when I encounter a variable declaration (through the parser/semantic analyzer) I allocate a register for that particular variable.

Now in the course of the program when I encounter this variable again, I retrieve it back from that register number. I save the register number of every variable in the symbol table.

My question now is this: Suppose, we have a statement like this -

a := b + c + e / f *h;

And I have saved a,b,c,e,f,h in R1, R2, R3, R4, R5, R6 respectively, the final generated code would be (assuming that the next available regs start from R7...)

R9 = R5 * R6;
R8 = R4 / R9;
R7 = R3 + R8;
R1 = R2 + R7;

What is the approach to "remember" what was the previous register and the operation?

If this is not the correct way of doing it, can anyone please give me some pointers on how to implement it?

Any suggestion would be great. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

单挑你×的.吻 2024-10-26 05:19:09

假设未绑定的寄存器集感觉就像“作弊”。在这种情况下，我想您可以继续前进并选择下一个可用的寄存器。

在现实情况下，寄存器分配通常使用“图形着色”来完成。在简单的情况下，图由表示变量（包括临时值）的节点和表示变量之间冲突的弧组成。目标是为每个变量分配一种颜色（代表寄存器编号），以便两个冲突的寄存器不会具有相同的颜色。（从历史上看，这个问题类似于绘制世界各国地图的问题，因此使用术语“颜色”。）

如果算法无法找到合适的颜色，则必须“溢出”一些变量，例如将它们放在堆栈上。

寄存器分配可以在代码生成之前或之后完成，每种都会引起不同的有趣问题。

您可能需要考虑的其他事情是调用约定，其中函数的参数和返回值必须放置在特定的寄存器中。

回复收藏 0 原文

寒尘 2024-10-26 05:19:09

寄存器绑定应该是最后一步。

要为上面的表达式生成代码，您应该首先将其转换为逆波兰表示法，如下所示：

a b c + e f / h * + :=

现在要生成代码，我们将执行此操作，每当遇到操作时，我们都会生成代码，同时维护中间结果堆栈。那么让我们开始吧。遍历它直到第一个操作：

a b c +

所以我们应该添加两个变量并存储即时结果。在 x86 中，加法只能在寄存器之间或寄存器和内存之间进行加法。所以我们必须将b加载到寄存器中，并向其中添加c：

MOV R1, [&b]
ADD R1, [&c]

此时我们不知道要使用哪个寄存器，所以我将其标记为R1。
现在我们的堆栈是：

a R1

继续并添加接下来的几个元素，直到下一个操作。

a R1 e f /

在 x86 中，除法很棘手，被除数存储在 EDX:EAX 64 位对中，商将存储在 EAX 中。假设您正在使用带符号的 32 位整数，我们必须将 e 加载到 EAX 中并将其符号扩展为 EDX，然后执行除法。所以需要生成这段代码：

MOV EAX, [&e]
CDQ
IDIV [&f]

幸运的是 EDX 和 EAX 都是免费的，所以我们可以使用它。如果它们已经被使用，我们需要将它们的值保存在另一个寄存器中以释放它们，我们现在不需要这样做。

结果保存在 EAX 中。所以我们的堆栈现在：

a R1 EAX

继续：

a R1 EAX h *

第一个操作数是一个寄存器，所以我们不需要加载任何东西，只需立即执行乘法：

IMUL EAX, [&h]

结果在 EAX 中，所以现在堆栈：

a R1 EAX

继续：

a R1 EAX +

再次寄存器到寄存器操作，只需一条指令即可完成：

ADD R1, EAX

结果在 R1 中。现在的堆栈：

a R1

最后是赋值：

MOV [&a], R1

所以几乎最终的代码将是：

MOV R1, [&b]
ADD R1, [&c]
MOV EAX, [&e]
CDQ
IDIV [&f]
IMUL EAX, [&h]
ADD R1, EAX
MOV [&a], R1

现在是时候替换 R1 了。可以是哪个寄存器？
主要规则是寄存器的生命周期不得重叠。
寄存器的生命周期从第一次完全覆盖（如 MOV）开始，到下一次完全覆盖（如果有）之前的最后一次使用结束。

MOV R1, [&b]  ; R1 lifetime starts
ADD R1, [&c]
MOV EAX, [&e] ; EAX lifetime starts.
CDQ           ; EDX lifetime starts
IDIV [&f]     ; EDX lifetime ends.
IMUL EAX, [&h]
ADD R1, EAX   ; EAX lifetime ends.
MOV [&a], R1  ; R1 lifetime ends

在我们的例子中，R1 的生命周期与 EAX 和 EDX 重叠。所以我们不能将它们用于 R1，但 EBX 未使用，因此我们可以使用它：

MOV EBX, [&b]
ADD EBX, [&c]
MOV EAX, [&e] 
CDQ           
IDIV [&f]     
IMUL EAX, [&h] 
ADD EBX, EAX  
MOV [&a], EBX

此后所有寄存器都是空闲的。

这将是您的表达式的代码。可以看出，每个变量只加载一次。因此，在这个特定示例中，将它们存储在寄存器中并没有多大好处。但是，如果您的函数有多个语句（当然会这样做），并且在生成整个函数的代码后仍然可以找到空闲变量，则应该将最常用的变量加载到寄存器中。

结论是，你应该为每种可能的情况制定一个计划，在这里我们看到了变量到变量的情况，但有时你有像 4*a 这样的雄蕊，你可以在其中加载 a，即左移而不是乘法。

To generate code for your expression above you should start with turning it to a reverse Polish notation, which looks like this:

a b c + e f / h * + :=

Now to generate the code we go through this and whenever we encounter an operation, we generate code while maintaining a stack of intermediate results. So let's start. Go through it until the first operation:

a b c +

So we are supposed to add two variables and store the immediate result. In x86 addition can add only between registers or register and memory. So we must load b into a register and add c to it:

MOV R1, [&b]
ADD R1, [&c]

At this point we don't know which register to use, so I marked it with R1.
Now our stack is:

a R1

Continue and add the next few elements until the next operation.

a R1 e f /

In x86 division is tricky, the dividend is stored in the EDX:EAX 64 bit pair, the quotient will be in EAX. Assuming you are working with signed 32 bit integers, we must load e into EAX and sign extend it to EDX, and the perform the division. So this code needs to be generated:

MOV EAX, [&e]
CDQ
IDIV [&f]

Fortunately both EDX and EAX is free so we can use it. If they were already used, we would need to save their value in another register to free them up, we don't need to do that now.

The result is in EAX. So our stack now:

a R1 EAX

Keep going:

a R1 EAX h *

The first operand is a register, so we don't need to load anything, just do the multiplication right away:

IMUL EAX, [&h]

The result is in EAX so the stack now:

a R1 EAX

Continue:

a R1 EAX +

Again a register to register operation, just a single instruction will do it:

ADD R1, EAX

The result is in R1. The stack now:

a R1

And finally the assignment:

MOV [&a], R1

So the almost final code will be:

MOV R1, [&b]
ADD R1, [&c]
MOV EAX, [&e]
CDQ
IDIV [&f]
IMUL EAX, [&h]
ADD R1, EAX
MOV [&a], R1

Now it's time to substitute R1. Which register can it be?
The main rule is that register lifetimes must not overlap.
The lifetime of a register that start with a first complete overwrite (like MOV) and ends with the last use before the next complete overwrite (if any).

MOV R1, [&b]  ; R1 lifetime starts
ADD R1, [&c]
MOV EAX, [&e] ; EAX lifetime starts.
CDQ           ; EDX lifetime starts
IDIV [&f]     ; EDX lifetime ends.
IMUL EAX, [&h]
ADD R1, EAX   ; EAX lifetime ends.
MOV [&a], R1  ; R1 lifetime ends

In our case R1's lifetime overlap with EAX and EDX. So we cannot use them for R1, but EBX is unused so we can use it:

MOV EBX, [&b]
ADD EBX, [&c]
MOV EAX, [&e] 
CDQ           
IDIV [&f]     
IMUL EAX, [&h] 
ADD EBX, EAX  
MOV [&a], EBX

And after this all registers are free.

And this will be code for your expression. It can be seen that each variable is loaded only once. So there is not much benefit to store them in registers in this particular example. But if your function has multiple statements (and of course will do), the most frequently used variables should be loaded into registers if you can still find a free one after the code for the entire function is generated.

The takeaway is that you should have a plan for each possible scenario, here we saw variable to variable scenario, but sometimes you have stamenents like 4*a, where you can load the a, the shift left instead of multiplication.

回复收藏 0 原文