Python 中的并行性

发布于 2024-09-04 11:26:56 字数 450 浏览 3 评论 0原文

在 Python 中实现并行性的选项有哪些?我想要对一些非常大的栅格执行大量 CPU 密集型计算,并且想要并行化它们。由于具有 C 语言背景,我熟悉三种并行方法:

  1. 消息传递进程,可能分布在集群中,例如MPI
  2. 显式共享内存并行,使用pthreadsfork()pipe()等。 al
  3. 使用OpenMP实现隐式共享内存并行。

决定使用方法是一种权衡的过程。

在 Python 中,有哪些可用的方法以及它们的特点是什么?是否有可集群的MPI克隆?实现共享内存并行性的首选方法是什么?我听说过有关GIL 的问题,以及tasklet 的问题。

简而言之,在选择 Python 中的不同并行化策略之前,我需要了解哪些内容?

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:

  1. Message passing processes, possibly distributed across a cluster, e.g. MPI.
  2. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
  3. Implicit shared memory parallelism, using OpenMP.

Deciding on an approach to use is an exercise in trade-offs.

In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.

In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦毁影碎の 2024-09-11 11:26:56

通常,您描述的是 CPU 密集型计算。这不是Python的强项。从历史上看,两者都不是多处理。

主流 Python 解释器中的线程一直受到可怕的全局锁的统治。新的 multiprocessing API 解决了这个问题,并提供了带有管道和队列等的工作池抽象。

您可以使用 CCython,并使用 Python 作为粘合剂。

Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.

Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.

You can write your performance critical code in C or Cython, and use Python for the glue.

土豪我们做朋友吧 2024-09-11 11:26:56

新的 (2.6) multiprocessing 模块是正确的选择。它使用子进程,解决了GIL问题。它还抽象了一些本地/远程问题,因此可以稍后选择在本地运行代码或分布在集群上。我上面链接的文档需要仔细阅读,但应该为入门提供良好的基础。

The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.

深海蓝天 2024-09-11 11:26:56

Ray 是一个用于执行此操作的优雅(且快速)的库。

并行化 Python 函数的最基本策略是使用 @ray.remote 装饰器声明函数。然后就可以异步调用了。

import ray
import time

# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)

@ray.remote
def f():
    time.sleep(1)

# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])

您还可以使用 actors 并行化有状态计算,再次使用 @ray.remote< /code> 装饰器。

# This assumes you already ran 'import ray' and 'ray.init()'.

import time

@ray.remote
class Counter(object):
    def __init__(self):
        self.x = 0

    def inc(self):
        self.x += 1

    def get_counter(self):
        return self.x

# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()

@ray.remote
def update_counters(counter1, counter2):
    for _ in range(1000):
        time.sleep(0.25)
        counter1.inc.remote()
        counter2.inc.remote()

# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)

# Check the counter values.
for _ in range(5):
    counter1_val = ray.get(counter1.get_counter.remote())
    counter2_val = ray.get(counter2.get_counter.remote())
    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
    time.sleep(1)

multiprocessing 模块相比,它具有许多优点:

Ray 是我一直在帮助开发的一个框架。

Ray is an elegant (and fast) library for doing this.

The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote decorator. Then it can be invoked asynchronously.

import ray
import time

# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)

@ray.remote
def f():
    time.sleep(1)

# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])

You can also parallelize stateful computation using actors, again by using the @ray.remote decorator.

# This assumes you already ran 'import ray' and 'ray.init()'.

import time

@ray.remote
class Counter(object):
    def __init__(self):
        self.x = 0

    def inc(self):
        self.x += 1

    def get_counter(self):
        return self.x

# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()

@ray.remote
def update_counters(counter1, counter2):
    for _ in range(1000):
        time.sleep(0.25)
        counter1.inc.remote()
        counter2.inc.remote()

# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)

# Check the counter values.
for _ in range(5):
    counter1_val = ray.get(counter1.get_counter.remote())
    counter2_val = ray.get(counter2.get_counter.remote())
    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
    time.sleep(1)

It has a number of advantages over the multiprocessing module:

Ray is a framework I've been helping develop.

梦途 2024-09-11 11:26:56

根据您需要处理的数据量以及您打算使用多少个 CPU/机器,在某些情况下最好用 C(如果您想使用 jython/IronPython,则使用 Java/C#)编写一部分数据

。您可以发现,这可能比在 8 个 CPU 上并行运行更能提高性能。

Depending on how much data you need to process and how many CPUs/machines you intend to use, it is in some cases better to write a part of it in C (or Java/C# if you want to use jython/IronPython)

The speedup you can get from that might do more for your performance than running things in parallel on 8 CPUs.

甜扑 2024-09-11 11:26:56

有很多包可以做到这一点,正如其他人所说,最合适的是多处理,特别是“Pool”类。

通过 parallel python 可以获得类似的结果,此外它还被设计用于集群。

不管怎样,我会说使用多处理。

There are many packages to do that, the most appropriate as other said is multiprocessing, expecially with the class "Pool".

A similar result can be obtained by parallel python, that in addition is designed to work with clusters.

Anyway, I would say go with multiprocessing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文