Python 进程间共享数据

发布于 2024-09-13 20:15:16 字数 363 浏览 14 评论 0原文

我有一个复杂的数据结构(用户定义类型),在其上执行大量独立计算。数据结构基本上是不可变的。我说基本上是因为尽管接口看起来是不可变的,但内部正在进行一些惰性评估。一些延迟计算的属性存储在字典中(按输入参数返回昂贵函数的值)。 我想使用Python multiprocessing 模块来并行化这些计算。我脑子里有两个问题。

  1. 如何最好地在进程之间共享数据结构?
  2. 有没有办法在不使用锁的情况下处理惰性求值问题(多个进程写入相同的值)?

预先感谢您的任何回答、评论或启发性问题!

I have a complex data structure (user-defined type) on which a large number of independent calculations are performed. The data structure is basically immutable. I say basically, because though the interface looks immutable, internally some lazy-evaluation is going on. Some of the lazily calculated attributes are stored in dictionaries (return values of costly functions by input parameter).
I would like to use Pythons multiprocessing module to parallelize these calculations. There are two questions on my mind.

  1. How do I best share the data-structure between processes?
  2. Is there a way to handle the lazy-evaluation problem without using locks (multiple processes write the same value)?

Thanks in advance for any answers, comments or enlightening questions!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

み零 2024-09-20 20:15:16

如何最好地在进程之间共享数据结构?

管道。

origin.py | process1.py | process2.py | process3.py

分解您的程序,使每个计算都是以下形式的单独过程。

def transform1( piece ):
    Some transformation or calculation.

对于测试,您可以像这样使用它。

def t1( iterable ):
    for piece in iterable:
        more_data = transform1( piece )
        yield NewNamedTuple( piece, more_data )

要在单个进程中重现整个计算,您可以这样做。

for x in t1( t2( t3( the_whole_structure ) ) ):
    print( x )

您可以使用一点文件 I/O 来包装每个转换。 Pickle 对此很有效,但其他表示形式(例如 JSON 或 YAML)也很有效。

while True:
    a_piece = pickle.load(sys.stdin)
    more_data = transform1( a_piece )
    pickle.dump( NewNamedTuple( piece, more_data ) )

每个处理步骤都成为一个独立的操作系统级进程。它们将同时运行,并将立即消耗所有操作系统级资源。

有没有办法在不使用锁的情况下处理延迟求值问题(多个进程写入相同的值)?

管道。

How do I best share the data-structure between processes?

Pipelines.

origin.py | process1.py | process2.py | process3.py

Break your program up so that each calculation is a separate process of the following form.

def transform1( piece ):
    Some transformation or calculation.

For testing, you can use it like this.

def t1( iterable ):
    for piece in iterable:
        more_data = transform1( piece )
        yield NewNamedTuple( piece, more_data )

For reproducing the whole calculation in a single process, you can do this.

for x in t1( t2( t3( the_whole_structure ) ) ):
    print( x )

You can wrap each transformation with a little bit of file I/O. Pickle works well for this, but other representations (like JSON or YAML) work well, too.

while True:
    a_piece = pickle.load(sys.stdin)
    more_data = transform1( a_piece )
    pickle.dump( NewNamedTuple( piece, more_data ) )

Each processing step becomes an independent OS-level process. They will run concurrently and will -- immediately -- consume all OS-level resources.

Is there a way to handle the lazy-evaluation problem without using locks (multiple processes write the same value)?

Pipelines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文