Python 中的线程

发布于 2024-07-29 04:22:01 字数 1434 浏览 9 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

云裳 2024-08-05 04:22:01

按照复杂性增加的顺序:

使用 线程模块

优点:

  • 运行任何函数都非常容易(事实上​​任何可调用的)在它的
    自己的线程。
  • 共享数据即使不容易(锁也不容易:),
    最不简单。

缺点:

  • 正如 Juergen 所提到的,Python 线程实际上无法同时访问解释器中的状态(有一个大锁,臭名昭著的全局解释器锁。)这在实践中意味着线程是有用的对于 I/O 绑定任务(网络、写入磁盘等),但对于并发计算根本没有用处。

使用 multiprocessing 模块

在简单的用例中,这看起来与使用线程< /code> 除了每个任务都在自己的进程而不是自己的线程中运行。 (几乎从字面上看:如果您采用 Eli 的示例,并替换 threading 与 multiprocessingThreadProcess 以及 Queue(模块)与 multiprocessing。队列,它应该运行得很好。)

优点:

  • 所有任务的实际并发性(没有全局解释器锁)。
  • 可扩展到多个处理器,甚至可以扩展到多台机器。

缺点:

  • 进程比线程慢。
  • 进程之间的数据共享比线程之间的数据共享更棘手。
  • 内存不是隐式共享的。 您要么必须显式共享它,要么必须腌制变量并来回发送它们。 这更安全,但更困难。 (如果它越来越重要,Python 开发人员似乎正在推动人们朝这个方向发展。)

使用事件模型,例如 Twisted

优点:

  • 您可以非常精细地控制优先级、执行时间。

缺点:

  • 即使有一个好的库,异步编程通常也比线程编程更难,无论是在理解应该发生的情况还是调试实际发生的情况方面都困难。

所有情况下,我假设您已经了解与多任务处理相关的许多问题,特别是如何在任务之间共享数据的棘手问题。 如果由于某种原因您不知道何时以及如何使用锁和条件,则必须从这些开始。 多任务代码充满了微妙之处和陷阱,最好在开始之前充分理解概念。

In order of increasing complexity:

Use the threading module

Pros:

  • It's really easy to run any function (any callable in fact) in its
    own thread.
  • Sharing data is if not easy (locks are never easy :), at
    least simple.

Cons:

  • As mentioned by Juergen Python threads cannot actually concurrently access state in the interpreter (there's one big lock, the infamous Global Interpreter Lock.) What that means in practice is that threads are useful for I/O bound tasks (networking, writing to disk, and so on), but not at all useful for doing concurrent computation.

Use the multiprocessing module

In the simple use case this looks exactly like using threading except each task is run in its own process not its own thread. (Almost literally: If you take Eli's example, and replace threading with multiprocessing, Thread, with Process, and Queue (the module) with multiprocessing.Queue, it should run just fine.)

Pros:

  • Actual concurrency for all tasks (no Global Interpreter Lock).
  • Scales to multiple processors, can even scale to multiple machines.

Cons:

  • Processes are slower than threads.
  • Data sharing between processes is trickier than with threads.
  • Memory is not implicitly shared. You either have to explicitly share it or you have to pickle variables and send them back and forth. This is safer, but harder. (If it matters increasingly the Python developers seem to be pushing people in this direction.)

Use an event model, such as Twisted

Pros:

  • You get extremely fine control over priority, over what executes when.

Cons:

  • Even with a good library, asynchronous programming is usually harder than threaded programming, hard both in terms of understanding what's supposed to happen and in terms of debugging what actually is happening.

In all cases I'm assuming you already understand many of the issues involved with multitasking, specifically the tricky issue of how to share data between tasks. If for some reason you don't know when and how to use locks and conditions you have to start with those. Multitasking code is full of subtleties and gotchas, and it's really best to have a good understanding of concepts before you start.

向地狱狂奔 2024-08-05 04:22:01

您已经得到了各种各样的答案,从“假线程”一直到外部框架,但我看到没有人提到 Queue.Queue —— CPython 线程的“秘密武器” 。

扩展一下:只要您不需要重叠纯 Python CPU 密集型处理(在这种情况下您需要 multiprocessing ——但它有自己的 Queue实现,所以你可以在一些必要的警告下应用我给出的一般建议;-),Python的内置线程会做......但如果你使用它会做得更好建议,例如,如下。

“忘记”共享内存,据说是线程与多处理相比的主要优点——它不能很好地工作,它不能很好地扩展,从来没有,也永远不会。 仅将共享内存用于在生成子线程之前设置一次并且之后从未更改的数据结构 - 对于其他所有内容,请使用一个单个线程负责该资源,并通过队列与该线程通信。

将专门的线程分配给您通常认为要通过锁保护的每个资源:可变数据结构或其内聚组、与外部进程(数据库、XMLRPC 服务器等)的连接、外部文件等获取一个小型线程池来执行没有或不需要此类专用资源的通用任务 - 不要在需要时生成线程,否则线程切换开销将会增加。压倒你。

两个线程之间的通信始终通过 Queue.Queue 进行——一种消息传递的形式,是多处理的唯一合理基础(除了事务内存,它很有前途,但据我所知没有生产价值)除 Haskell 之外的实现)。

每个管理单个资源(或小的内聚资源集)的专用线程都会侦听特定 Queue.Queue 实例上的请求。 池中的线程在单个共享 Queue.Queue 上等待(Queue 是绝对线程安全的,不会)。

只需要在某个队列(共享或专用)上排队请求的线程会这样做,而不等待结果,然后继续前进。 最终确实需要请求结果或确认的线程将一对(请求,接收队列)与它们刚刚创建的 Queue.Queue 实例一起排列,最终,当响应或确认对于继续进行是必不可少的时,它们会得到(等待)从他们的接收队列中。 确保您已准备好获取错误响应以及真实响应或确认(顺便说一句,Twisted 的 deferred 非常适合组织这种结构化响应!)。

您还可以使用队列来“停放”资源实例,这些实例可以由任何一个线程使用,但永远不会同时在多个线程之间共享(与某些 DBAPI 组件的数据库连接、与其他组件的游标等)——这可以让您放松支持更多池化的专用线程要求(从共享队列获取需要可排队资源的请求的池线程将从适当的队列获取该资源,必要时等待,等等)。

Twisted 实际上是组织这个小步舞曲(或视情况而定的广场舞)的好方法,这不仅归功于延迟,还因为其健全、可靠、高度可扩展的基础架构:只有在以下情况下,您才可以安排事物使用线程或子进程:确实有必要,同时在单个事件驱动线程中完成通常被认为是线程值得的大多数事情。

但是,我意识到 Twisted 并不适合所有人——“专用或池化资源,使用队列,永远不要做任何需要锁的事情,或者 Guido 禁止的任何更高级的同步过程,例如信号量或条件”方法可以即使您无法理解异步事件驱动方法,仍然可以使用它,并且仍然比我偶然发现的任何其他广泛适用的线程方法提供更高的可靠性和性能。

You've already gotten a fair variety of answers, from "fake threads" all the way to external frameworks, but I've seen nobody mention Queue.Queue -- the "secret sauce" of CPython threading.

To expand: as long as you don't need to overlap pure-Python CPU-heavy processing (in which case you need multiprocessing -- but it comes with its own Queue implementation, too, so you can with some needed cautions apply the general advice I'm giving;-), Python's built-in threading will do... but it will do it much better if you use it advisedly, e.g., as follows.

"Forget" shared memory, supposedly the main plus of threading vs multiprocessing -- it doesn't work well, it doesn't scale well, never has, never will. Use shared memory only for data structures that are set up once before you spawn sub-threads and never changed afterwards -- for everything else, make a single thread responsible for that resource, and communicate with that thread via Queue.

Devote a specialized thread to every resource you'd normally think to protect by locks: a mutable data structure or cohesive group thereof, a connection to an external process (a DB, an XMLRPC server, etc), an external file, etc, etc. Get a small thread pool going for general purpose tasks that don't have or need a dedicated resource of that kind -- don't spawn threads as and when needed, or the thread-switching overhead will overwhelm you.

Communication between two threads is always via Queue.Queue -- a form of message passing, the only sane foundation for multiprocessing (besides transactional-memory, which is promising but for which I know of no production-worthy implementations except In Haskell).

Each dedicated thread managing a single resource (or small cohesive set of resources) listens for requests on a specific Queue.Queue instance. Threads in a pool wait on a single shared Queue.Queue (Queue is solidly threadsafe and won't fail you in this).

Threads that just need to queue up a request on some queue (shared or dedicated) do so without waiting for results, and move on. Threads that eventually DO need a result or confirmation for a request queue a pair (request, receivingqueue) with an instance of Queue.Queue they just made, and eventually, when the response or confirmation is indispensable in order to proceed, they get (waiting) from their receivingqueue. Be sure you're ready to get error-responses as well as real responses or confirmations (Twisted's deferreds are great at organizing this kind of structured response, BTW!).

You can also use Queue to "park" instances of resources which can be used by any one thread but never be shared among multiple threads at one time (DB connections with some DBAPI compoents, cursors with others, etc) -- this lets you relax the dedicated-thread requirement in favor of more pooling (a pool thread that gets from the shared queue a request needing a queueable resource will get that resource from the apppropriate queue, waiting if necessary, etc etc).

Twisted is actually a good way to organize this minuet (or square dance as the case may be), not just thanks to deferreds but because of its sound, solid, highly scalable base architecture: you may arrange things to use threads or subprocesses only when truly warranted, while doing most things normally considered thread-worthy in a single event-driven thread.

But, I realize Twisted is not for everybody -- the "dedicate or pool resources, use Queue up the wazoo, never do anything needing a Lock or, Guido forbid, any synchronization procedure even more advanced, such as semaphore or condition" approach can still be used even if you just can't wrap your head around async event-driven methodologies, and will still deliver more reliability and performance than any other widely-applicable threading approach I've ever stumbled upon.

许仙没带伞 2024-08-05 04:22:01

这取决于您想要做什么,但我偏向于仅使用标准库中的 threading 模块,因为它使得获取任何函数并在单独的函数中运行变得非常容易线。

from threading import Thread

def f():
    ...

def g(arg1, arg2, arg3=None):
    ....

Thread(target=f).start()
Thread(target=g, args=[5, 6], kwargs={"arg3": 12}).start()

等等。 我经常使用 Queue 模块提供的同步队列来设置生产者/消费者

from Queue import Queue
from threading import Thread

q = Queue()
def consumer():
    while True:
        print sum(q.get())

def producer(data_source):
    for line in data_source:
        q.put( map(int, line.split()) )

Thread(target=producer, args=[SOME_INPUT_FILE_OR_SOMETHING]).start()
for i in range(10):
    Thread(target=consumer).start()

It depends on what you're trying to do, but I'm partial to just using the threading module in the standard library because it makes it really easy to take any function and just run it in a separate thread.

from threading import Thread

def f():
    ...

def g(arg1, arg2, arg3=None):
    ....

Thread(target=f).start()
Thread(target=g, args=[5, 6], kwargs={"arg3": 12}).start()

And so on. I often have a producer/consumer setup using a synchronized queue provided by the Queue module

from Queue import Queue
from threading import Thread

q = Queue()
def consumer():
    while True:
        print sum(q.get())

def producer(data_source):
    for line in data_source:
        q.put( map(int, line.split()) )

Thread(target=producer, args=[SOME_INPUT_FILE_OR_SOMETHING]).start()
for i in range(10):
    Thread(target=consumer).start()
赏烟花じ飞满天 2024-08-05 04:22:01

Kamaelia 是一个 Python 框架,用于构建具有大量通信进程的应用程序。

(来源:kamaelia.org Kamaelia - 并发变得有用、有趣

在 Kamaelia 中,您可以使用相互通信的简单组件构建系统。 这可以加快开发速度,极大地帮助维护,并且还意味着您可以构建自然并发的软件。 它旨在供任何开发人员(包括新手)使用。 这也让它变得有趣:)

什么样的系统? 网络服务器、客户端、桌面应用程序、基于 pygame 的游戏、转码系统和管道、数字电视系统、垃圾邮件根除器、教学工具以及更多:)

Here's a video from Pycon 2009. It starts by comparing Kamaelia to Twisted and Parallel Python and then gives a hands on demonstration of Kamaelia.

与 Kamaelia 轻松并发 - 第 1 部分 (59:08)
使用 Kamaelia 轻松实现并发 - 第 2 部分 (18:15)

Kamaelia is a python framework for building applications with lots of communicating processes.

(source: kamaelia.org) Kamaelia - Concurrency made useful, fun

In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)

What sort of systems? Network servers, clients, desktop applications, pygame based games, transcode systems and pipelines, digital TV systems, spam eradicators, teaching tools, and a fair amount more :)

Here's a video from Pycon 2009. It starts by comparing Kamaelia to Twisted and Parallel Python and then gives a hands on demonstration of Kamaelia.

Easy Concurrency with Kamaelia - Part 1 (59:08)
Easy Concurrency with Kamaelia - Part 2 (18:15)

美人骨 2024-08-05 04:22:01

关于 Kamaelia,上面的答案并没有真正涵盖这里的好处。 Kamaelia 的方法提供了一个统一的接口,用于处理线程、生成器和线程,该接口很实用,但并不完美。 单个系统中的进程以实现并发。

从根本上来说,它提供了一个具有收件箱和发件箱的运行事物的隐喻。 您将消息发送到发件箱,当连接在一起时,消息从发件箱流到收件箱。 无论您使用生成器、线程或进程,还是与其他系统对话,这个隐喻/API 都保持不变。

“不完美”部分是由于尚未为收件箱和发件箱添加语法糖(尽管这正在讨论中) - 系统重点关注安全性/可用性。

以上面使用裸线程的生产者消费者为例,这在 Kamaelia 中变成了这样:

Pipeline(Producer(), Consumer() )

在这个例子中,这些组件是否是线程组件并不重要,从使用的角度来看,它们之间的唯一区别是组件的基类。 生成器组件使用列表进行通信,线程组件使用 Queue.Queues 进行通信,基于进程的组件使用 os.pipes 进行通信。

不过,这种方法背后的原因是让调试错误变得更加困难。 在线程或任何共享内存并发中,您面临的第一个问题是意外破坏共享数据更新。 通过使用消息传递,您可以消除一类错误。

如果您在任何地方都使用裸线程和锁,那么您通常会假设编写代码时不会犯任何错误。 虽然我们都渴望这一点,但这种情况很少发生。 通过将锁定行为集中在一处,您可以简化可能出错的地方。 (上下文处理程序有帮助,但对上下文处理程序之外的意外更新没有帮助)

显然,并不是每段代码都可以编写为消息传递和共享风格,这就是为什么 Kamaelia 也有一个简单的软件事务内存(STM),它是一个非常好的想法,但有一个令人讨厌的名字 - 它更像是变量的版本控制 - 即检查一些变量,更新它们并提交回来。 如果发生冲突,请冲洗并重复。

相关链接:

无论如何,我希望这是一个有用的答案。 FWIW,Kamaelia 设置背后的核心原因是为了使并发更安全& 在Python系统中更容易使用,无需摇尾巴。 (即一大桶组件,

我可以理解为什么其他 Kamaelia 答案被修改,因为即使对我来说,它看起来也更像是广告而不是答案。作为 Kamaelia 的作者,很高兴看到热情,尽管我希望这包含一点更多相关内容:-)

这就是我的说法,请注意,这个答案从定义上来说是有偏见的,但对我来说,Kamaelia 的目标是尝试包装 IMO 最佳实践。 我建议尝试几个系统,看看哪个适合您。 (如果这不适合堆栈溢出,抱歉 - 我是这个论坛的新手:-)

Regarding Kamaelia, the answer above doesn't really cover the benefit here. Kamaelia's approach provides a unified interface, which is pragmatic not perfect, for dealing with threads, generators & processes in a single system for concurrency.

Fundamentally it provides a metaphor of a running thing which has inboxes, and outboxes. You send messages to outboxes, and when wired together, messages flow from outboxes to inboxes. This metaphor/API remains the same whether you're using generators, threads or processes, or speaking to other systems.

The "not perfect" part is due to syntactic sugar not being added as yet for inboxes and outboxes (though this is under discussion) - there is a focus on safety/usability in the system.

Taking the producer consumer example using bare threading above, this becomes this in Kamaelia:

Pipeline(Producer(), Consumer() )

In this example it doesn't matter if these are threaded components or otherwise, the only difference is between them from a usage perspective is the baseclass for the component. Generator components communicate using lists, threaded components using Queue.Queues and process based using os.pipes.

The reason behind this approach though is to make it harder to make hard to debug bugs. In threading - or any shared memory concurrency you have, the number one problem you face is accidentally broken shared data updates. By using message passing you eliminate one class of bugs.

If you use bare threading and locks everywhere you're generally working on the assumption that when you write code that you won't make any mistakes. Whilst we all aspire to that, it's very rare that will happen. By wrapping up the locking behaviour in one place you simplify where things can go wrong. (Context handlers help, but don't help with accidental updates outside the context handler)

Obviously not every piece of code can be written as message passing and shared style which is why Kamaelia also has a simple software transactional memory (STM), which is a really neat idea with a nasty name - it's more like version control for variables - ie check out some variables, update them and commit back. If you get a clash you rinse and repeat.

Relevant links:

Anyway, I hope that's a useful answer. FWIW, the core reason behind Kamaelia's setup is to make concurrency safer & easier to use in python systems, without the tail wagging the dog. (ie the big bucket of components

I can understand why the other Kamaelia answer was modded down, since even to me it looks more like an ad than an answer. As the author of Kamaelia it's nice to see enthusiasm though I hope this contains a bit more relevant content :-)

And that's my way of saying, please take the caveat that this answer is by definition biased, but for me, Kamaelia's aim is to try and wrap what is IMO best practice. I'd suggest trying a few systems out, and seeing which works for you. (also if this is inappropriate for stack overflow, sorry - I'm new to this forum :-)

向地狱狂奔 2024-08-05 04:22:01

如果我必须使用线程,我会使用 Stackless Python 的微线程(Tasklet)。

整个在线游戏(大型多人游戏)是围绕 Stackless 及其多线程原理构建的 - 因为最初的游戏只是为了减慢游戏的大型多人游戏属性。

CPython 中的线程受到广泛反对。 原因之一是 GIL(一种全局解释器锁),它将执行的许多部分的线程序列化。 我的经验是,以这种方式创建快速应用程序确实很困难。 我的示例编码在线程处理时速度较慢 - 使用一个核心(但许多等待输入应该可以提高一些性能)。

对于 CPython,如果可能的话,最好使用单独的进程。

I would use the Microthreads (Tasklets) of Stackless Python, if I had to use threads at all.

A whole online game (massivly multiplayer) is build around Stackless and its multithreading principle -- since the original is just to slow for the massivly multiplayer property of the game.

Threads in CPython are widely discouraged. One reason is the GIL -- a global interpreter lock -- that serializes threading for many parts of the execution. My experiance is, that it is really difficult to create fast applications this way. My example codings where all slower with threading -- with one core (but many waits for input should have made some performance boosts possible).

With CPython, rather use seperate processes if possible.

抹茶夏天i‖ 2024-08-05 04:22:01

如果您确实想亲自动手,可以尝试 使用生成器来伪造协程。 就所涉及的工作而言,它可能不是最有效的,但协程确实为您提供了对协作式多任务处理的非常精细的控制,而不是您在其他地方找到的抢占式多任务处理。

您会发现的一个优点是,总的来说,在使用协作多任务处理时,您不需要锁或互斥体,但对我来说更重要的优点是“线程”之间的切换速度几乎为零。 当然,据说 Stackless Python 也非常适合这一点; 然后是 Erlang,如果它不是 Python 的话。

协作多任务处理的最大缺点可能是普遍缺乏阻塞 I/O 的解决方法。 在伪造的协程中,您还会遇到这样的问题:除了线程内堆栈的顶层之外,您无法从任何其他位置切换“线程”。

当您使用假协程创建了一个稍微复杂的应用程序后,您将真正开始欣赏操作系统级别的进程调度工作。

If you really want to get your hands dirty, you can try using generators to fake coroutines. It probably isn't the most efficient in terms of work involved, but coroutines do offer you very fine control of co-operative multitasking rather than pre-emptive multitasking you'll find elsewhere.

One advantage you'll find is that by and large, you will not need locks or mutexes when using co-operative multitasking, but the more important advantage for me was the nearly-zero switching speed between "threads". Of course, Stackless Python is said to be very good for that as well; and then there's Erlang, if it doesn't have to be Python.

Probably the biggest disadvantage in co-operative multitasking is the general lack of workaround for blocking I/O. And in the faked coroutines, you'll also encounter the issue that you can't switch "threads" from anything but the top level of the stack within a thread.

After you've made an even slightly complex application with fake coroutines, you'll really begin to appreciate the work that goes into process scheduling at the OS level.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文