如何将Python线程代码转换为多处理代码?
由于多种原因(GIL、内存泄漏),我需要将线程
应用程序转换为多处理
应用程序。幸运的是,线程是完全隔离的,仅通过 Queue.Queue 进行通信。该原语也可用于多处理,因此一切看起来都很好。现在,在进入这个雷区之前,我想就即将出现的问题获得一些建议:
- 如何确保我的对象可以通过
队列
传输?我需要提供一些__setstate__
吗? - 我可以依赖
put
立即返回(就像使用threading
Queue
一样)吗? - 一般提示/技巧?
- 除了 Python 文档之外,还有什么值得阅读的吗?
I need to convert a threading
application to a multiprocessing
application for multiple reasons (GIL, memory leaks). Fortunately the threads are quite isolated and only communicate via Queue.Queue
s. This primitive is also available in multiprocessing
so everything looks fine. Now before I enter this minefield I'd like to get some advice on the upcoming problems:
- How to ensure that my objects can be transfered via the
Queue
? Do I need to provide some__setstate__
? - Can I rely on
put
returning instantly (like withthreading
Queue
s)? - General hints/tips?
- Anything worthwhile to read apart from the Python documentation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
第 1 部分的答案:
所有必须通过
multiprocessing.Queue
(或Pipe
或其他)的内容都必须是 picklable。这包括基本类型,例如元组、列表和字典。如果类是顶级且不太复杂的话,也支持类(查看详细信息)。然而,尝试传递 lambda 将会失败。对第 2 部分的回答:
put
由两部分组成:它需要一个信号量来修改队列,并且它可以选择启动一个供给线程。因此,如果没有其他进程
尝试同时放入
到同一个队列
(例如因为只有一个进程< /code> 写入它),它应该很快。对我来说,事实证明它对于所有实际目的来说都足够快了。
第 3 部分的部分答案:
multiprocessing.queue.Queue
缺少task_done
方法,因此不能直接用作直接替换。 (子类提供了该方法。)processing.queue.Queue
缺少qsize
方法,而新的multiprocessing
版本不准确(只需保留记住这一点)。fork
上继承,因此需要注意在正确的进程中关闭它们。Answer to part 1:
Everything that has to pass through a
multiprocessing.Queue
(orPipe
or whatever) has to be picklable. This includes basic types such astuple
s,list
s anddict
s. Also classes are supported if they are top-level and not too complicated (check the details). Trying to passlambda
s around will fail however.Answer to part 2:
A
put
consists of two parts: It takes a semaphore to modify the queue and it optionally starts a feeder thread. So if no otherProcess
tries toput
to the sameQueue
at the same time (for instance because there is only oneProcess
writing to it), it should be fast. For me it turned out to be fast enough for all practical purposes.Partial answer to part 3:
multiprocessing.queue.Queue
lacks atask_done
method, so it cannot be used as a drop-in replacement directly. (A subclass provides the method.)processing.queue.Queue
lacks aqsize
method and the newermultiprocessing
version is inaccurate (just keep this in mind).fork
, care needs to be taken about closing them in the right processes.