如何将 Python 对象存储在内存中以供不同进程使用?
情况是这样的:我有一个巨大的对象需要加载到内存中。如此之大,如果加载两次,它将超出我机器上的可用内存(不,我无法升级内存)。我也无法将它分成任何更小的部分。为了简单起见,我们假设对象大小为 600 MB,而我只有 1 GB RAM。我需要从在多个进程中运行的 Web 应用程序使用此对象,并且我无法控制它们的生成方式(第三方负载均衡器可以做到这一点),因此我不能仅仅依赖于创建该对象在一些主线程/进程中,然后生成子线程。这也消除了使用 POSH 之类的可能性,因为它依赖于它自己的自定义 fork 调用。我也无法使用 SQLite 内存数据库、mmap 或 posix_ipc、sysv_ipc 和 shm 模块之类的东西,因为它们充当内存中的文件,而这些数据必须是我才能使用的对象。使用其中一个,我必须将其作为文件读取,然后将其转换为每个单独进程和 BAM 中的对象,由于超出机器的内存限制而出现分段错误,因为我只是尝试加载第二个副本。
必须有某种方法将 Python 对象存储在内存中(而不是作为文件/字符串/序列化/pickled)并使其可以从任何进程访问。我只是不知道那是什么。我查遍了 StackOverflow 和 Google,但找不到答案,所以我希望有人能帮助我。
Here's the situation: I have a massive object that needs to be loaded into memory. So big that if it is loaded in twice it will go beyond the available memory on my machine (and no, I can't upgrade the memory). I also can't divide it up into any smaller pieces. For simplicity's sake, let's just say the object is 600 MB and I only have 1 GB of RAM. I need to use this object from a web app, which is running in multiple processes, and I don't control how they're spawned (a third party load balancer does that), so I can't rely on just creating the object in some master thread/process and then spawning off children. This also eliminates the possibility of using something like POSH because that relies on it's own custom fork call. I also can't use something like a SQLite memory database, mmap or the posix_ipc, sysv_ipc, and shm modules because those act as a file in memory, and this data has to be an object for me to use it. Using one of those I would have to read it as a file and then turn it into an object in each individual process and BAM, segmentation fault from going over the machine's memory limit because I just tried to load in a second copy.
There must be someway to store a Python object in memory (and not as a file/string/serialized/pickled) and have it be accessible from any process. I just don't know what it is. I've looked all over StackOverflow and Google and can't find the answer to this, so I'm hoping somebody can help me out.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
http://docs.python.org/library/multiprocessing.html #sharing-state- Between-processes
查找共享内存或服务器进程。重新阅读您的后服务器进程后,听起来更接近您想要的。
http://en.wikipedia.org/wiki/Shared_memory
http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
Look for shared memory, or Server process. After re-reading your post Server process sounds closer to what you want.
http://en.wikipedia.org/wiki/Shared_memory
工作中不是这样的。 Python 对象引用计数和对象的内部指针在多个进程中没有意义。
如果数据不必是实际的 Python 对象,您可以尝试处理存储在 mmap() 或数据库等中的原始数据。
That isn't the way in works. Python object reference counting and an object's internal pointers do not make sense across multiple processes.
If the data doesn't have to be an actual Python object, you can try working on the raw data stored in mmap() or in a database or somesuch.
我会将其实现为导入到每个 Python 脚本中的 C 模块。然后这个大对象的接口将用 C 或 C 和 Python 的某种组合来实现。
I would implement this as a C module that gets imported into each Python script. Then the interface to this large object would be implemented in C, or some combination of C and Python.