可以在避免/无挑战之前访问/保存Python多处理结果吗?
我正在使用Python 多处理
软件包并并行运行模拟的几个迭代。
pool =多探索.pool()
result = pool.map(runsimulation,args)
每个仿真返回一个大数据字典。考虑到我必须运行的大量迭代,结果中包含的最终输出
是巨大的。理想情况下,保存之前,我会腌制如此大的数据文件。但是,我对多处理
模块的理解是,它由Outque
组成,该保留每个任务的串行返回值。稍后,一种称为_RESULT_HANDLER
的绑定方法应将这些值归档,并将它们返回pool.map()
。
我的问题是:我可以在通过_RESULT_HANDLER
的必要性化之前访问并保存这些结果的序列化版本,从而避免了多个序列化/序列化的回合?
I am using the Python multiprocessing
package to run several iterations of a simulation in parallel.
pool = multiprocessing.Pool()
result = pool.map(runSimulation, args)
Each simulation returns a large dictionary of data. Given the large number of iterations I have to run, the final output contained in result
is huge. Ideally, I'd pickle such a large data file before saving it. However, my understanding of the multiprocessing
module is that it consists of an outque
which holds the serialized return values of each task. Later, a bound method called _result_handler
deserializes these values, and returns them to pool.map()
.
My question is this: can I access and save the serialized version of these results before it is deserialized by _result_handler
, thus avoiding multiple rounds of serializing/deserializing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
即使您可以避免它,您也依赖于
多处理
当前使用的序列化格式。取而代之的是,在返回之前(或者是序列化版本)之前,您可以自己序列化字典。然后,传输序列化和估算化bytes
,而不是具有许多对象的字典,这要高得多。测试结果具有一百万个项目的命令,比较了您当前的(重新)序列化之后的方法, 传输与序列化 。 The latter is over three times faster (and probably uses a lot less memory, too), and barely takes longer than just serializing the dictionary:
Code (Try it online!):
Even if you can avoid it, you then depend on the serialization format currently used by
multiprocessing
. You could instead serialize your dictionary yourself before returning it (or rather its serialized version then). Then the transfer serializes and deserializesbytes
instead of a dictionary with lots of objects, which is much more efficient.Test results with a dict of a million items, comparing your current way of (re)serializing after the transfer versus serializing before the transfer. The latter is over three times faster (and probably uses a lot less memory, too), and barely takes longer than just serializing the dictionary:
Code (Try it online!):