据我了解,python(CPython)中有两种类型的模块:
- .so(C 扩展名)
- .py
即使有不同的进程/解释器导入它们,.so 也仅加载一次。
.py 为每个进程/解释器加载一次(除非显式重新加载)。
有没有办法让 .py 可以被多个进程/解释器共享?
人们仍然需要一些层来存储对模块所做的修改。
我认为第一步可以将解释器嵌入到 .so 中。有没有已经开发出来的解决方案。
我承认我在这方面的可行想法可能还很遥远。请原谅我的无知。
As I understand there are two types of modules in python (CPython):
- the .so (C extension)
- the .py
The .so are only loaded once even when there are different processes/interpreters importing them.
The .py are loaded once for each process/interpreter (unless reloading explicitly).
Is there a way .py can be shared by multiple processes/interpreters?
One would still need some layer where one could store modifications done to the module.
I'm thinking one could embed the interpreter in a .so as the first step. Is there an already developed solution.
I acknowledge i may be very far off in terms of feasible ideas about this. Please excuse my ignorance.
发布评论
评论(3)
.so
(或.pyd
)文件仅占用一次内存空间(变量段除外)的原因是它们被操作系统内核识别为目标代码。.py
文件仅被识别为文本文件/数据;是 Python 解释器授予它们“代码”状态。将 Python 解释器嵌入到共享库中并不能解决这个问题。尽管在多个进程中使用
.py
文件,但仅加载一次需要在 CPython 内部进行深入的更改。如果您想节省内存空间,最好的选择是使用 .so 文件>Cython。这可能需要对模块进行一些更改。
The reason
.so
(or.pyd
) files take up memory space only once (except for their variables segment) is that they are recognized by the OS kernel as object code..py
files are only recognized as text files/data; it's the Python interpreter that grants them "code" status. Embedding the Python interpreter in a shared library won't resolve this.Loading
.py
files only once despite their use in multiple processes would require changes deep inside CPython.Your best option, if you want to save memory space, is to compile Python modules to
.so
files using Cython. That may require some changes to the modules.不,没有办法。 Python 是如此高度动态,以至于我不确定每个进程是否有意义,例如,您可以对模块进行猴子修补。也许无论如何都会有一种共享代码的方法,但对于可能需要大量工作的事情来说,好处非常小。
No, there is no way. Python is so highly dynamic that each process that I'm not sure it would make any sense anyway, as you could monkey-patch the modules, for example. Perhaps there would be a way to share the code anyway, but the benefit would be very small for something that is likely to be a lot of work.
我能给你的最好答案是“并非不可能,但我不知道它是否会发生”。
你必须考虑实际发生的事情。当遇到 .py 文件时,Python 必须读取该文件,编译它,然后执行字节代码。编译发生在进程内部,因此无法共享。
当您遇到 .so 文件时,操作系统会链接到为该库保留的内存中。所有进程共享相同的内存区域,因此可以节省内存。
Python 已经有了第三种加载模块的方法。如果可以,在加载 .py 文件时,它会创建一个加载速度更快的预编译 .pyc 文件(可以避免编译)。下次加载 .pyc 文件时。可以想象,他们可以通过将 .pyc 文件映射到内存中来实现。 (使用 MAP_PRIVATE 以防以后其他事情弄乱该字节代码。)如果他们这样做,那么共享模块默认会出现在共享内存中。
我不知道是否真的以这种方式实施了。
The best answer I can give you is "not impossible, but I don't know if it happens".
You have to think about what is actually happening. When you encounter a .py file, Python has to read the file, compile it, and then execute byte code. Compilation takes place inside of the process, and so can't be shared.
When you encounter a .so file, the operating system links in memory that has been reserved for that library. All processes share the same memory region, and so you save memory.
Python already has a third way of loading modules. If it can, upon loading a .py file, it creates a pre-compiled .pyc file that is faster to load (you avoid compilation). The next time it loads the .pyc file. They conceivably could the .pyc file by just mmapping it into memory. (Using MAP_PRIVATE in case other things mess with that byte code later.) If they did that, then shared modules would by default wind up in shared memory.
I have no idea whether it has actually been implemented in this way.