Python 进程间共享数据
我有一个复杂的数据结构(用户定义类型),在其上执行大量独立计算。数据结构基本上是不可变的。我说基本上是因为尽管接口看起来是不可变的,但内部正在进行一些惰性评估。一些延迟计算的属性存储在字典中(按输入参数返回昂贵函数的值)。 我想使用Python multiprocessing 模块来并行化这些计算。我脑子里有两个问题。
- 如何最好地在进程之间共享数据结构?
- 有没有办法在不使用锁的情况下处理惰性求值问题(多个进程写入相同的值)?
预先感谢您的任何回答、评论或启发性问题!
I have a complex data structure (user-defined type) on which a large number of independent calculations are performed. The data structure is basically immutable. I say basically, because though the interface looks immutable, internally some lazy-evaluation is going on. Some of the lazily calculated attributes are stored in dictionaries (return values of costly functions by input parameter).
I would like to use Pythons multiprocessing module to parallelize these calculations. There are two questions on my mind.
- How do I best share the data-structure between processes?
- Is there a way to handle the lazy-evaluation problem without using locks (multiple processes write the same value)?
Thanks in advance for any answers, comments or enlightening questions!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
管道。
分解您的程序,使每个计算都是以下形式的单独过程。
对于测试,您可以像这样使用它。
要在单个进程中重现整个计算,您可以这样做。
您可以使用一点文件 I/O 来包装每个转换。 Pickle 对此很有效,但其他表示形式(例如 JSON 或 YAML)也很有效。
每个处理步骤都成为一个独立的操作系统级进程。它们将同时运行,并将立即消耗所有操作系统级资源。
管道。
Pipelines.
Break your program up so that each calculation is a separate process of the following form.
For testing, you can use it like this.
For reproducing the whole calculation in a single process, you can do this.
You can wrap each transformation with a little bit of file I/O. Pickle works well for this, but other representations (like JSON or YAML) work well, too.
Each processing step becomes an independent OS-level process. They will run concurrently and will -- immediately -- consume all OS-level resources.
Pipelines.