制作 JIT 编译器
我编写了一个 Brainfuck 实现(C++),其工作原理如下:
- 读取输入 Brainfuck 文件
- 进行简单的优化
- 将 Brainfuck 转换为 VM 的机器代码
- 在 VM 中执行此机器代码
这相当快,但瓶颈现在在 VM 。它是用 C++ 编写的,读取令牌、执行操作(如果您了解 Brainfuck,那么这些操作并不多)等等。
我想要做的是剥离虚拟机并动态生成本机机器代码(基本上是一个 JIT 编译器)。这很容易实现 20 倍的加速。
这意味着步骤 3 被 JIT 编译器取代,步骤 4 被执行生成的机器代码取代。
我不知道从哪里开始,所以我有几个问题:
- 这是如何工作的,生成的机器代码如何执行?
- 是否有用于生成本机机器代码的 C++ 库?
I've written a Brainfuck implementation (C++) that works like this:
- Read input brainfuck file
- Do trivial optimizations
- Convert brainfuck to machine code for the VM
- Execute this machine code in the VM
This is pretty fast, but the bottleneck is now at the VM. It's written in C++ and reads a token, executes an action (which aren't many at all, if you know Brainfuck) and so on.
What I want to do is strip out the VM and generate native machine code on the fly (so basicly, a JIT compiler). This can easily be a 20x speedup.
This would mean step 3 gets replaced by a JIT compiler and step 4 with the executing of the generated machine code.
I don't know really where to start, so I have a few questions:
- How does this work, how does the generated machine code get executed?
- Are there any C++ libraries for generating native machine code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
生成的机器代码只是像平常一样使用
jmp
编辑或call
编辑。有时还需要禁用内存上包含生成代码的不执行标志(NX 位)。在 Linux 中,这是通过mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC.)
在 Windows 中,NX 称为 DEP。有一些...例如http://www.gnu.org/software/lightning/ - GNU Lightning(通用)和 https://developer.mozilla.org/En/Nanojit< /a> - Nanojit,用于 Firefox JavaScript JIT 引擎。更强大、更现代的JIT是LLVM,你只需要将BF代码翻译成LLVM IR,然后LLVM就可以为许多平台进行优化和代码生成,或者在具有JIT功能的解释器(虚拟机)上运行LLVM IR。有一篇关于 BF & 的帖子LLVM 具有用于 BF 的完整 LLVM JIT 编译器 http://www.remcobloemen.nl/ 2010/02/brainfuck-using-llvm/
另一个BF +LLVM编译器在这里,在LLVM的svn中:https://llvm.org/svn/llvm-project/llvm /trunk/examples/BrainF/BrainF.cpp
Generated machine code is just
jmp
-ed to orcall
-ed as usual function. Sometimes it also needed to disable no-execution flag (NX bit) on memory, containing generated code. In linux, this is done withmprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC.)
In windows the NX is called DEP.There are some... E.g. http://www.gnu.org/software/lightning/ - GNU Lightning (universal) and https://developer.mozilla.org/En/Nanojit - Nanojit, which is used in Firefox JavaScript JIT engines. More powerful and modern JIT is LLVM, you just need to translate BF code into LLVM IR, and then LLVM can do optimisations and code generation for many platforms, or run LLVM IR on interpreter (virtual machine) with JIT capabilities. There is a post about BF & LLVM with complete LLVM JIT compiler for BF http://www.remcobloemen.nl/2010/02/brainfuck-using-llvm/
Another BF +LLVM compiler is here, in the svn of LLVM: https://llvm.org/svn/llvm-project/llvm/trunk/examples/BrainF/BrainF.cpp
LLVM 是一个完整的 C++ 库(或一组库),用于从中间形式生成本机代码,包含文档和示例,并已用于产生 JITters。
(它还有一个使用该框架的 C/C++ 编译器 - 但该框架本身可用于其他语言)。
LLVM is a complete C++ library (or set of libraries) for generating native code from an intermediate form, complete with documentation and examples, and which has been used to produce JITters.
(It also has a C/C++ compiler which uses the framework - however the framework itself can be used for other languages).
这可能会迟到,但为了帮助其他人,我发布了这个答案。
JIT 编译器具有 AOT 编译器具有的所有步骤。主要区别在于,AOT 编译器将机器相关代码输出到可执行文件(如 exe 等),而 JIT 编译器在运行时将机器相关代码加载到内存中(因此每次都需要重新编译和加载,因此会产生性能开销)。
JIT编译器如何在运行时将机器代码加载到内存中?
我不会教您有关机器代码的知识,因为我假设您已经了解它,
例如。汇编代码
被翻译给
你动态生成翻译后的代码并将其保存到这样的向量中(这是一个C向量),
然后你将这个向量复制到内存中,为此你需要知道这段代码所需的内存大小,你可以使用可以通过 machinecode.size() 获取并记住页面大小。
要将这个向量复制到内存中,您需要调用 C 中的 mmap 函数。
将指针设置为代码的开头并调用它。你可以走了。
抱歉,如果有任何不清楚的地方,为了简单起见,您可以随时查看这篇文章
https://solarianprogrammer.com/2018 /01/10/writing-minimal-x86-64-jit-compiler-cpp/
https://github.com/spencertipping/jit-tutorial
This might be late but for the sake of help to any other i am posting this answer.
JIT compiler has all the steps that AOT compiler has. The main difference is that AOT compiler outputs the machine dependent code to an executable file like exe etc while the JIT compiler loads the machine dependent code into the memory at run time (hence the performance overhead because every time it needs to recompile and load).
How a JIT compiler loads the machine code into the memory at runtime ?
i will not teach you about the machine code because i assume you already know about it,
for eg. assembly code
is translated to
you dynamically generate translated code and save it into a vector like this (this is a C vector)
then you copy this vector into the memory, for this you need to know the memory size required by this code, which u can get by machinecode.size() and keep in mind the page size.
to copy this vector into the memory u need to call mmap function in C.
set the pointer to the beginning of your code and call it. u are good to go.
Sorry if anything is not clear, u can always check out this post for the simplicity
https://solarianprogrammer.com/2018/01/10/writing-minimal-x86-64-jit-compiler-cpp/
https://github.com/spencertipping/jit-tutorial
GNU Lightning 是一组宏,可以为几种不同的体系结构生成本机代码。您需要对汇编代码有充分的了解,因为您的步骤 3 将涉及使用 Lightning 宏将机器代码直接发送到稍后将执行的缓冲区中。
GNU Lightning is a set of macros which can generate native code for a few different architectures. You will need a solid understanding of assembly code because your step 3 will involve using Lightning macros to emit machine code directly into a buffer you will later execute.