从三地址代码到 JVM 字节码的代码生成

发布于 2024-12-19 22:36:19 字数 848 浏览 6 评论 0原文

我正在研究 Renjin 的字节码编译器（R 代表 JVM），并尝试将中间三地址码 (TAC) 表示形式转换为字节码。我查阅过的所有有关编译器的教科书都讨论了代码生成期间的寄存器分配，但我还没有找到任何用于在基于堆栈的虚拟机（如 JVM）上生成代码的资源。

简单的 TAC 指令很容易翻译成字节码，但当涉及临时指令时我会有点迷失。有没有人有任何描述这一点的资源指针？

这是一个完整的示例：

原始 R 代码如下所示：

x + sqrt(x * y)

TAC IR：（

 0:  _t2 := primitive<*>(x, y)
 1:  _t3 := primitive<sqrt>(_t2)
 2:  return primitive<+>(x, _t3)

暂时忽略我们不能总是在编译时解析对原语的函数调用的事实）

生成的 JVM 字节代码看起来（大致）类似于这：

aload_x 
dup
aload_y
invokestatic r/primitives/Ops.multiply(Lr/lang/Vector;Lr/lang/Vector;)
invokestatic r/primitives/Ops.sqrt(Lr/lang/Vector;)
invokestatic r/primitives/Ops.plus(Lr/lang/Vector;Lr/lang/Vector;)
areturn

基本上，在程序的顶部，当我到达 TAC 指令 2 时，我已经需要考虑在堆栈开头需要局部变量 x 。我可以手动思考这一点但我有很难通过算法来正确地做到这一点。有什么指点吗？

原文

I'm working on the byte code compiler for Renjin (R for the JVM) and am experimenting with translating our intermediate three address code (TAC) representation to byte code. All the textbooks on compilers that I've consulted discuss register allocation during code generation, but I haven't been able to find any resources for code generation on stack-based virtual machines like the JVM.

Simple TAC instructions are trivial to translate into bytecode, but I get a bit lost when temporaries are involved. Does any one have any pointers to resources that describe this?

Here is a complete example:

Original R code looks like this:

x + sqrt(x * y)

TAC IR:

 0:  _t2 := primitive<*>(x, y)
 1:  _t3 := primitive<sqrt>(_t2)
 2:  return primitive<+>(x, _t3)

(ignore for a second the fact taht we can't always resolve function calls to primitives at compile time)

The resulting JVM byte code would look (roughly) something like this:

aload_x 
dup
aload_y
invokestatic r/primitives/Ops.multiply(Lr/lang/Vector;Lr/lang/Vector;)
invokestatic r/primitives/Ops.sqrt(Lr/lang/Vector;)
invokestatic r/primitives/Ops.plus(Lr/lang/Vector;Lr/lang/Vector;)
areturn

Basically, at the top of the program, I already need to be thinking that I'm going to need local variable x at the beginning of the stack by the time that i get to TAC instruction 2. I can think this through manually but I'm having trouble thinking through an algorithm to do this correctly. Any pointers?

分享到QQ

分享到微博