为什么 JIT 代码比编译或解释的代码消耗更多的内存?
诸如C
之类的编译代码消耗很少的内存。
像Python这样的解释型代码会消耗更多的内存,这是可以理解的。
通过 JIT,程序在运行时被(选择性地)编译成机器代码。那么 JIT 程序的内存消耗不应该介于编译程序和解释程序之间吗?
相反,JIT 化程序(例如 PyPy
)消耗的内存比等效的解释程序(例如 Python
)多几倍。为什么?
Compiled code such as C
consumes little memory.
Interpreted code such as Python
consumes more memory, which is understandable.
With JIT, a program is (selectively) compiled into machine code at run time. So shouldn't the memory consumption of a JIT'ed program be somewhere between that of a compiled and an interpreted program?
Instead a JIT'ed program (such as PyPy
) consume several times more memory than the equivalent interpreted program (such as Python
). Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
跟踪 JIT 编译器会占用更多内存,因为它们不仅需要保留 VM 的字节码,还需要保留直接可执行的机器代码。但这只是故事的一半。
大多数 JIT 还将保留大量有关字节码(甚至机器代码)的元数据,以允许它们确定哪些内容需要进行 JIT 处理以及哪些内容可以保留。跟踪 JIT(例如 LuaJIT)还会创建跟踪快照,用于在运行时微调代码,执行循环展开或分支重新排序等操作。
有些还保留常用代码段的缓存或快速查找缓冲区,以加快 JIT 代码的创建速度(LuaJIT 通过 DynAsm 实现这一点,如果正确完成,它实际上可以帮助减少内存使用,就像 dynasm 的情况一样)。
内存使用很大程度上取决于所使用的 JIT 引擎及其编译语言的性质(强类型与弱类型)。一些 JIT 采用先进的技术,例如基于 SSA 的寄存器分配器和变量活跃度分析,此类优化以及循环变量提升等更常见的技术也有助于消耗内存。
Tracing JIT compilers take quite a bit more memory due to the fact that they need to keep not only the bytecode for the VM, but also the directly executable machine code as well. this is only half the story however.
Most JIT's will also keep a lot of meta data about the bytecode (and even the machine code) to allow them to determine what needs to be JIT'ed and what can be left alone. Tracing JIT's (such as LuaJIT) also create trace snapshots which are used to fine tune code at run time, performing things like loop unrolling or branch reordering.
Some also keep caches of commonly used code segments or fast lookup buffers to speed up creation of JIT'ed code (LuaJIT does this via DynAsm, it can actually help reduce memory usage when done correctly, as is the case with dynasm).
The memory usage greatly depends on the JIT engine employed and the nature of the language it compiles (strongly vs weakly-typed). some JIT's employ advanced techniques such as SSA based register allocators and variable livelyness analysis, these sort of optimizations helps consume memory as well, along with the more common things like loop variable hoisting.
请注意您所谈论的内存使用类型。
编译为 C 的代码对于编译的机器代码本身来说使用相对较少的内存。
我希望给定算法的 Python 字节码实际上比类似算法的编译 C 代码要小,因为 Python 字节码操作的级别要高得多,因此通常需要更少的操作来完成给定的事情。但是Python程序也会在内存中保存Python解释器的编译代码,这本身就是一个相当大且复杂的程序。另外,典型的 Python 程序在内存中将比典型的 C 程序拥有更多的标准库(如果是静态链接,C 程序可以删除它实际不使用的所有函数,如果是动态链接,则它共享编译后的代码与内存中使用它的任何其他进程)。
然后,PyPy 在此基础上包含 JIT 编译器的机器代码,以及从 Python 字节码生成的机器代码(它不会消失,也必须保留)。因此,您的直觉(JITed 系统“应该”消耗介于编译语言和完全解释语言之间的内存)无论如何都是不正确的。
但除此之外,您还获得了程序运行的数据结构所使用的实际内存。这差异很大,并且与程序是否提前编译、解释或解释和 JIT 无关。一些编译器优化将减少内存使用(无论是提前应用还是及时应用),但许多编译器优化实际上会牺牲内存使用来提高速度。无论如何,对于处理任何大量数据的程序来说,它都会使代码本身使用的内存完全相形见绌。
当你说:
您正在考虑哪些计划?如果您实际上进行了任何比较,我从您的问题中猜测它们将在 PyPy 和 CPython 之间进行。我知道 PyPy 的许多数据结构实际上比 CPython 的小,但同样,这与 JIT 无关。
如果程序的主要内存使用量是代码本身,那么 JIT 编译器会增加巨大的内存开销(对于编译器本身和编译后的代码),并且根本无法通过以下方式“赢回”内存使用量:优化。如果主要的内存使用是程序数据结构,那么无论是否启用 JIT,我都不会惊讶地发现 PyPy 使用的内存明显少于 CPython。
对于你的“为什么?”并没有一个简单的答案。因为你问题中的陈述并不完全正确。哪个系统使用更多内存取决于许多因素; JIT 编译器的存在或不存在是一个因素,但它并不总是重要的。
Be careful about what kind of memory usage you're talking about.
Code compiled to C uses comparatively little memory for the compiled machine code itself.
I would expect Python bytecode for a given algorithm to actually be smaller than the compiled C code for a similar algorithm, because Python bytecode operations are much higher level so there's often fewer of them to get a given thing done. But a Python program will also have the compiled code of the Python interpreter in memory, which is quite a large and complex program in itself. Plus a typical Python program will have much more of the standard library in memory than a typical C program (and a C program can strip out all the functions it doesn't actually use if it's statically linked, and if it's dynamically linked then it shares the compiled code with any other process in memory that uses it).
PyPy then has on top of this the machine code of the JIT compiler, as well as the machine code generated from the Python bytecode (which doesn't go away, it has to be kept around as well). So your intuition (that a JITed system "should" consume memory somewhere between that of a compiled language and a fully interpreted language) isn't correct anyway.
But on top of all of those you've got the actual memory used by the data structures the program operates on. This varies immensely, and has little to do with whether the program is compiled ahead of time, or interpreted, or interpreted-and-JITed. Some compiler optimisations will reduce memory usage (whether they're applied ahead of time or just in time), but many actually trade off memory usage to gain speed. For programs that manipulate any serious amount of data it will completely dwarf the memory used by the code itself, anyway.
When you say:
What programs are you thinking of? If you've actually done any comparisons, I'm guessing from your question that they would be between PyPy and CPython. I know many of PyPy's data structures are actually smaller than CPython's, but again, that has nothing to do with the JIT.
If the dominant memory usage of a program is the code itself, then a JIT compiler adds huge memory overhead (for the compiler itself, and the compiled code), and can't do very much at all to "win back" memory usage through optimisation. If the dominant memory usage is program data structures, then I wouldn't be at all surprised to find PyPy using significantly less memory than CPython, whether or not the JIT was enabled.
There's not really a straightforward answer to your "Why?" because the statements in your question are not straightforwardly true. Which system uses more memory depends on many factors; the presence or absence of a JIT compiler is one factor, but it isn't always significant.