您可以使用Python中的多处理进行嵌套并行化吗?

发布于 2025-02-02 08:53:52 字数 817 浏览 1 评论 0原文

我是Python中多处理的新手,我正在尝试做以下操作:

import os
from multiprocessing import Pool
from random import randint

def example_function(a):

    new_numbers = [randint(1, a) for i in range(0, 50)]

    with Pool(processes=os.cpu_count()-1) as pool:
        results = pool.map(str, new_numbers)

    return results


if __name__ == '__main__':

    numbers = [randint(1, 50) for i in range(0, 50)]

    with Pool(processes=os.cpu_count()) as pool:
        results = pool.map(example_function, numbers)

    print("Final results:", results)

但是,当运行此操作时,我会得到:“ OssertionError:不允许守护程序可以生育孩子”。

互换pool.map for for loop确实使其正常工作。例如,第二个:

results = []
for n in numbers:
    results.append(example_function(n))

但是,由于外部任务和内部任务都非常密集,因此我希望能够同时平行这两者。我该怎么做?

I am new to multiprocessing in Python and I am trying to do the following:

import os
from multiprocessing import Pool
from random import randint

def example_function(a):

    new_numbers = [randint(1, a) for i in range(0, 50)]

    with Pool(processes=os.cpu_count()-1) as pool:
        results = pool.map(str, new_numbers)

    return results


if __name__ == '__main__':

    numbers = [randint(1, 50) for i in range(0, 50)]

    with Pool(processes=os.cpu_count()) as pool:
        results = pool.map(example_function, numbers)

    print("Final results:", results)

However, when running this I get: "AssertionError: daemonic processes are not allowed to have children".

Interchanging either pool.map for a for loop does make it work. E.g. for the second one:

results = []
for n in numbers:
    results.append(example_function(n))

However, since both the outer and inner tasks are very intensive I would like to be able to parallelize both. How can I do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青萝楚歌 2025-02-09 08:53:52

Multiprocessing.pool使用daemon标志设置为true创建进程。根据 python文档process class> class class ,这阻止了在工作过程中创建子过程:

该过程的守护标志,布尔值。必须在调用start()之前设置此设置。
初始值是从创建过程继承的。
当一个过程退出时,它试图终止其所有守护程序过程
请注意,不允许守护程序创建子过程否则,守护程序的过程将在父母进程退出时终止其孤儿。此外,这些不是Unix守护程序或服务,它们是正常的过程,如果已退出非副词的过程,将终止(不加入)。

从理论上讲,您可以创建自己的池,并使用绕过​​过程创建以创建非传说过程的自定义上下文。但是,您不应该这样做,因为如文档中所述,流程的终止将不安全。

实际上,在池中创建池并不是一个好主意,因为池的每个过程都会创建另一个流程。这导致创建许多过程非常低效。在某些情况下,对于操作系统而言,该过程的数量将太大,无法创建它们(平台的限制取决于限制)。例如,在许多核心处理器上AMD ThreadRipper处理器使用128个线程,进程总数将为128 * 128 = 16384,显然是不合理的。

解决此问题的通常解决方案是推理任务而不是处理。可以将任务添加到共享队列,因此工人可以计算钉子,然后工人可以通过在共享队列中添加新任务来产生新任务。 afaik,多处理管理器对设计这样的系统很有用。

multiprocessing.Pool creates processes with the daemon flag set to True. According to the Python documentation of the Process class, this prevent sub-processes to be created in worker processes:

The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.

Theoretically, you can create your own pool and use a custom context that bypass the process creation to create non-daemonic process. However, you should not do that because the termination of processes would be unsafe as stated in the documentation.

In fact, creating pools in pools is not a good idea in practice as each process of the pool will create another pool of processes. This results in a lot of processes being created which is very inefficient. In some cases, the number of processes would be too big for the OS to be able to create them (there is a limit dependent of the platform). For example, on a many core processor like a recent 64-core AMD threadripper processor with 128 threads, the total number of processes will be 128 * 128 = 16384 which is clearly not reasonable.

The usual solution to solve this problem is to reason about tasks and not processes. Tasks can be added to a shared queue, so tacks can be computed by workers, and then workers can spawn new tasks by adding new tasks in the shared queue. AFAIK, multiprocessing managers are useful to design such a system.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文