JIT代码生成技术

发布于 2024-07-03 23:33:37 字数 220 浏览 14 评论 0原文

虚拟机如何动态生成本机机器代码并执行它?

假设您可以弄清楚想要发出的本机机器操作码是什么,那么如何实际运行它呢?

它是否像将助记符指令映射到二进制代码,将其填充到 char* 指针中并将其转换为函数并执行一样hacky?

或者您会生成一个临时共享库(.dll 或 .so 或其他)并使用 LoadLibrary 等标准函数将其加载到内存中?

How does a virtual machine generate native machine code on the fly and execute it?

Assuming you can figure out what are the native machine op-codes you want to emit, how do you go about actually running it?

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Or would you generate a temporary shared library (.dll or .so or whatever) and load it into memory using standard functions like LoadLibrary ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

×眷恋的温暖 2024-07-10 23:33:37

是的。 您只需构建一个 char* 并执行它。 但是,您需要注意一些细节。 char* 必须位于内存的可执行部分,并且必须具有正确的对齐方式。

除了 nanojit 之外,您还可以查看 LLVM,它是另一个库,能够将各种程序表示形式编译为函数指针。 它的界面很干净,生成的代码也很高效。

Yup. You just build up a char* and execute it. However, you need to note a couple details. The char* must be in an executable section of memory and must have proper alignment.

In addition to nanojit you can also check out LLVM which is another library that's capable of compiling various program representations down to a function pointer. It's interface is clean and the generated code tends to be efficient.

寄人书 2024-07-10 23:33:37

您只需将 程序计数器 指向您要执行的代码即可。 请记住,数据可以是数据或代码。 在 x86 上,程序计数器是 EIP 寄存器。 EIP 的 IP 部分代表指令指针。 调用 JMP 指令跳转到某个地址。 跳转后的 EIP 就会包含这个地址。

这是否像将助记符指令映射到二进制代码,将其填充到 char* 指针中并将其转换为函数并执行一样 hacky?

是的。 这是一种方法。 生成的代码将被转换为 C 中的 指向函数的指针

You can just make the program counter point to the code you want to execute. Remember that data can be data or code. On x86 the program counter is the EIP register. The IP part of EIP stands for instruction pointer. The JMP instruction is called to jump to an address. After the jump EIP will contain this address.

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Yes. This is one way of doing it. The resulting code would be cast to a pointer to function in C.

决绝 2024-07-10 23:33:37

这是否像将助记符指令映射到二进制代码,将其填充到 char* 指针中并将其转换为函数并执行一样hacky?

是的,如果您使用 C 或 C++(或类似的语言)执行此操作,那么您确实会这样做。

它看起来很老套,但这实际上是语言设计的产物。 请记住,您要使用的实际算法非常简单:确定要使用哪些指令,将它们加载到内存中的缓冲区中,然后跳转到该缓冲区的开头。

不过,如果您确实尝试这样做,请确保返回 C 程序时调用约定正确。 我想如果我想生成代码,我会寻找一个库来为我处理这方面的事情。 Nanojit 最近成为新闻焦点; 你可以看看那个。

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Yes, if you were doing it in C or C++ (or something similar), that's exactly what you'd do.

It appears hacky, but that's actually an artifact of the language design. Remember, the actual algorithm you want to use is very simple: determine what instructions you want to use, load them into a buffer in memory, and jump to the beginning of that buffer.

If you really try to do this, though, make sure you get the calling convention right when you return to your C program. I think if I wanted to generate code I'd look for a library to take care of that aspect for me. Nanojit's been in the news recently; you could look at that.

小糖芽 2024-07-10 23:33:37

据我所知,它会编译内存中的所有内容,因为它必须运行一些启发式方法来优化代码(即:随着时间的推移内联),但您可以查看 共享源通用语言基础结构 2.0 转子版本。 除了抖动和 GC 之外,整个代码库与 .NET 相同。

As far as i know it compiles everything in memory because it has to run some heuristics to to optimize the code (i.e.: inlining over time) but you can have a look at the Shared Source Common Language Infrastructure 2.0 rotor release. The whole codebase is identical to .NET except for the Jitter and the GC.

春庭雪 2024-07-10 23:33:37

除了 Rotor 2.0 - 您还可以查看 HotSpot 虚拟机打开JDK。

As well as Rotor 2.0 - you could also take a look at the HotSpot virtual machine in the OpenJDK.

撩人痒 2024-07-10 23:33:37

关于生成 DLL:为此需要额外的 I/O,加上链接,加上生成 DLL 格式的复杂性,将使事情变得更加复杂,最重要的是,它们会降低性能; 另外,最后你仍然调用一个指向加载代码的函数指针,所以......
此外,JIT 编译一次可以发生一种方法,如果您想这样做,您将生成许多小 DLL。

关于“可执行部分”的要求,在 POSIX 系统上调用 mprotect() 可以修复权限(Win32 上有类似的 API)。 您需要对大内存段执行此操作,而不是每个方法执行一次,否则会太慢。

在普通的 x86 上你不会注意到这个问题,在带有 PAE 的 x86 或 64 位 AMD64/Intel 64 位机器上你会遇到段错误。

About generating a DLL: the additional required I/O for that, plus linking, plus the complexity of generating the DLL format, would make that much more complicate, and above all they'd kill performance; additionally, in the end you still call a function pointer to the loaded code, so...
Also, JIT compilation can happen one method at a time, and if you want to do that you'd generate lots of small DLLs.

About the "executable section" requirement, calling mprotect() on POSIX systems can fix the permissions (there's a similar API on Win32). You need to do that for a big memory segment instead that once per method since it'd be too slow otherwise.

On plain x86 you wouldn't notice the problem, on x86 with PAE or 64bit AMD64/Intel 64 bit machines you'd get a segfault.

放肆 2024-07-10 23:33:37

是不是像映射一样老套
助记符指令转为二进制
代码,将其填充到 char* 中
指针并将其转换为函数
并执行?

是的,这有效。

要在 Windows 中执行此操作,您必须将 PAGE_EXECUTE_READWRITE 设置为分配的块:

void (*MyFunc)() = (void (*)()) VirtualAlloc(NULL, sizeofblock,  MEM_COMMIT, PAGE_EXECUTE_READWRITE);

//Now fill up the block with executable code and issue-

MyFunc();

Is it something as hacky as mapping
the mnemonic instructions to binary
codes, stuffing it into an char*
pointer and casting it as a function
and executing?

Yes, that works.

To do this in windows you must set PAGE_EXECUTE_READWRITE to the allocated block:

void (*MyFunc)() = (void (*)()) VirtualAlloc(NULL, sizeofblock,  MEM_COMMIT, PAGE_EXECUTE_READWRITE);

//Now fill up the block with executable code and issue-

MyFunc();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文