Python 多处理队列、管道、共享内存
我想并行运行两个 Python 进程,每个进程都可以随时向另一个进程发送数据并从另一个进程接收数据。 Python的multiprocessing包似乎对此有多种解决方案,例如Queue,Pipe和共享内存。使用其中每一种的优点和缺点是什么,哪一种最适合实现这一特定目标?
I want to run two Python processes in parallel, with each being able to send data to and receive data from the other at any time. Python's multiprocessing package seems to have multiple solutions to this, such as Queue, Pipe, and SharedMemory. What are the pros and cons of using each of these, and which one would be best for accomplishing this specific goal?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这取决于您想要分享的内容、您想要与谁分享、您想要分享的频率以及延迟时间需求是您的技能、您的可维护性需求和您的偏好。然后通常需要在性能、易读性、可升级性等之间进行权衡。
如果您共享本机 Python 对象,它们通常最简单地通过“多处理队列”共享,因为它们将在传输之前打包并在接收时解包。
如果您共享大型数组(例如图像),您可能会发现“多处理共享内存”的开销最小,因为不涉及酸洗。但是,如果您想通过网络与其他机器共享此类阵列,共享内存将不起作用,因此您可能需要求助于Redis或其他一些技术。一般来说,“多处理共享内存”需要更多的设置,并且需要您执行更多操作来同步访问,但对于较大的数据集来说性能更高。
如果您在 Python 和 C/C++ 或其他语言之间进行共享,您可以选择使用协议缓冲区和管道,或者再次选择使用 Redis。
正如我所说,存在许多权衡和意见 - 远远超过我在这里讨论的内容。首先是确定您在带宽、延迟、灵活性方面的需求,然后考虑最合适的技术。
It comes down to what you want to share, who you want to share it with, how often you want to share it, what your latency requirements are, and your skill-set, your maintainability needs and your preferences. Then there are the usual tradeoffs to be made between performance, legibility, upgradeability and so on.
If you are sharing native Python objects, they will generally be most simply shared via a "multiprocessing queue" because they will be packaged up before transmission and unpackaged on receipt.
If you are sharing large arrays, such as images, you will likely find that "multiprocessing shared memory" has least overhead because there is no pickling involved. However, if you want to share such arrays with other machines across a network, shared memory will not work, so you may need to resort to Redis or some other technology. Generally, "multiprocessing shared memory" takes more setting up, and requires you to do more to synchronise access, but is more performant for larger data-sets.
If you are sharing between Python and C/C++ or another language, you may elect to use protocol buffers and pipes, or again Redis.
As I said, there are many tradeoffs and opinions - far more than I have addressed here. The first thing though is to determine your needs in terms of bandwidth, latency, flexibility and then think about the most appropriate technology.