用汇编语言编写 JIT 编译器

发布于 2024-10-18 03:01:59 字数 663 浏览 8 评论 0原文

我已经用 C 语言编写了一个虚拟机，它对于非 JIT 虚拟机具有不错的性能，但我想学习新的东西并提高性能。我当前的实现只是使用一个开关将 VM 字节码转换为指令，然后将其编译为跳转表。正如我所说，性能确实不错，但我遇到了一个只能通过 JIT 编译器才能克服的障碍。

不久前我已经问过一个关于自修改代码的类似问题，但我开始意识到我没有问正确的问题。

所以我的目标是为这个 C 虚拟机编写一个 JIT 编译器，并且我想在 x86 汇编中完成它。（我使用 NASM 作为我的汇编器）我不太确定如何去做这件事。我对汇编很满意，并且我已经查看了一些自修改代码示例，但我还没有弄清楚如何进行代码生成。

到目前为止，我的主要块是将指令与我的参数复制到可执行内存块。我知道我可以在 NASM 中标记某一行，并使用静态参数从该地址复制整行，但这不是很动态，并且不适用于 JIT 编译器。我需要能够解释字节码中的指令，将其复制到可执行内存，解释第一个参数，将其复制到内存，然后解释第二个参数，并将其复制到内存。

我了解到有几个库可以使这项任务变得更容易，例如 GNU Lightning，甚至 LLVM。但是，在使用外部资源之前，我想先手动编写此代码，以了解其工作原理。

该社区是否可以提供任何资源或示例来帮助我开始执行此任务？一个简单的例子显示了两个或三个指令（如“add”和“mov”）被用来在内存中动态地生成带有参数的可执行代码，这将创造奇迹。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倥絔 2024-10-25 03:01:59

我根本不建议在汇编中编写 JIT。在汇编中编写解释器最常执行的位有很好的论据。有关示例，请参阅来自 Mike Pall 的评论，LuaJIT的作者。

至于 JIT，有许多不同的级别，其复杂性也各不相同：

通过简单地复制解释器的代码来编译基本块（一系列非分支指令）。例如，一些（基于寄存器的）字节码指令的实现可能如下所示：
<前><代码>; ebp指向堆栈上的虚拟寄存器0
指令添加：
<解码指令>
mov eax, [ebp + ecx * 4] ;从堆栈加载第一个操作数
添加 eax, [ebp + edx * 4] ;从堆栈中添加第二个操作数
mov [ebp + ebx * 4], eax ;写回结果
<发送下一条指令>
指令_SUB：
...;相似的
因此，给定指令序列ADD R3, R1, R2, SUB R3, R3, R4，简单的 JIT 可以将解释器实现的相关部分复制到新的机器代码块：
<前><代码> mov ecx, 1
移动edx，2
移动 ebx, 3
mov eax, [ebp + ecx * 4] ;从堆栈加载第一个操作数
添加 eax, [ebp + edx * 4] ;从堆栈中添加第二个操作数
mov [ebp + ebx * 4], eax ;写回结果
移动ecx，3
移动edx, 4
移动 ebx, 3
mov eax, [ebp + ecx * 4] ;从堆栈加载第一个操作数
子 eax, [ebp + edx * 4] ;从堆栈中添加第二个操作数
mov [ebp + ebx * 4], eax ;写回结果
这只是复制了相关代码，因此我们需要相应地初始化所使用的寄存器。更好的解决方案是将其直接转换为机器指令 mov eax, [ebp + 4]，但现在您已经必须手动对请求的指令进行编码。
这种技术消除了解释的开销，但在其他方面并没有提高太多效率。如果代码只执行一两次，那么首先将其转换为机器代码可能不值得（这需要刷新至少部分 I-cache）。
虽然一些 JIT 使用上述技术而不是解释器，但它们对频繁执行的代码采用了更复杂的优化机制。这涉及将执行的字节码转换为中间表示（IR），并对其执行额外的优化。
根据源语言和 JIT 的类型，这可能非常复杂（这就是许多 JIT 将此任务委托给 LLVM 的原因）。基于方法的 JIT 需要处理连接控制流图，因此它们使用 SSA 形式并对其运行各种分析（例如，Hotspot）。
跟踪 JIT（如 LuaJIT 2）仅编译直线代码，这使得许多事情更容易实现，但您必须非常小心如何选择跟踪以及如何有效地将多个跟踪链接在一起。 Gal 和 Franz 在 this 中描述了一种方法论文（PDF）。另一种方法请参见 LuaJIT 源代码。这两种 JIT 都是用 C（或者可能是 C++）编写的。

I wouldn't recommend writing a JIT in assembly at all. There are good arguments for writing the most frequently executed bits of the interpreter in assembly. For an example of how this looks like see this comment from Mike Pall, the author of LuaJIT.

As for the JIT, there are many different levels with varying complexity:

Compile a basic block (a sequence of non-branching instructions) by simply copying the interpreter's code. For example, the implementations of a few (register-based) bytecode instructions might look like this:
```
; ebp points to virtual register 0 on the stack
instr_ADD:
    <decode instruction>
    mov eax, [ebp + ecx * 4]  ; load first operand from stack
    add eax, [ebp + edx * 4]  ; add second operand from stack
    mov [ebp + ebx * 4], eax  ; write back result
    <dispatch next instruction>
instr_SUB:
    ... ; similar
```
So, given the instruction sequence ADD R3, R1, R2, SUB R3, R3, R4 a simple JIT could copy the relevant parts of the interpreters implementation into a new machine code chunk:
```
    mov ecx, 1
    mov edx, 2
    mov ebx, 3
    mov eax, [ebp + ecx * 4]  ; load first operand from stack
    add eax, [ebp + edx * 4]  ; add second operand from stack
    mov [ebp + ebx * 4], eax  ; write back result
    mov ecx, 3
    mov edx, 4
    mov ebx, 3
    mov eax, [ebp + ecx * 4]  ; load first operand from stack
    sub eax, [ebp + edx * 4]  ; add second operand from stack
    mov [ebp + ebx * 4], eax  ; write back result
```
This simply copies the relevant code, so we need to initialise the registers used accordingly. A better solution would be to translate this directly into machine instructions mov eax, [ebp + 4], but now you already have to manually encode the requested instructions.
This technique removes the overheads of interpretation, but otherwise does not improve efficiency much. If the code is executed for only one or two times, then it may not worth it to first translate it to machine code (which requires flushing at least parts of the I-cache).
While some JITs use the above technique instead of an interpreter, they then employ a more complicated optimisation mechanism for frequently executed code. This involves translating the executed bytecode into an intermediate representation (IR) on which additional optimisations are performed.
Depending on the source language and the type of JIT, this can be very complex (which is why many JITs delegate this task to LLVM). A method-based JIT needs to deal with joining control-flow graphs, so they use SSA form and run various analyses on that (e.g., Hotspot).
A tracing JIT (like LuaJIT 2) only compiles straight line code which makes many things easier to implement, but you have to be very careful how you pick traces and how you link multiple traces together efficiently. Gal and Franz describe one method in this paper (PDF). For another method see the LuaJIT source code. Both JITs are written in C (or perhaps C++).