Twisted:同时使用多个线程和进程
Twisted 文档让我相信,可以在同一个应用程序中组合诸如 Reactor.spawnProcess() 和 threads.deferToThread() 之类的技术,反应器将在幕后优雅地处理这个问题。实际尝试后,我发现我的应用程序死锁了。单独使用多个线程,或者单独使用子进程,一切都很好。
查看反应堆源代码,我发现 SelectReactor.spawnProcess()
方法只是调用 os.fork()
,而不考虑可能正在运行的多个线程。这解释了死锁,因为从调用 os.fork() 开始,您将拥有两个进程,其中多个并发线程正在运行,并且谁知道使用相同的文件描述符执行什么操作。
我的问题是,解决这个问题的最佳策略是什么?
我的想法是子类化SelectReactor
,以便它是一个单例,并且仅在实例化时立即调用os.fork()
一次。子进程将在后台运行并充当父进程的服务器(通过管道使用对象序列化来来回通信)。父进程继续运行应用程序并可以根据需要使用线程。父进程中对 spawnProcess()
的调用将被委托给子进程,这将保证只有一个线程在运行,因此可以安全地调用 os.fork()
。
以前有人这样做过吗?有更快的方法吗?
The Twisted documentation led me to believe that it was OK to combine techniques such as reactor.spawnProcess()
and threads.deferToThread()
in the same application, that the reactor would handle this elegantly under the covers. Upon actually trying it, I found that my application deadlocks. Using multiple threads by themselves, or child processes by themselves, everything is fine.
Looking into the reactor source, I find that the SelectReactor.spawnProcess()
method simply calls os.fork()
without any consideration for multiple threads that might be running. This explains the deadlocks, because starting with the call to os.fork()
you will have two processes with multiple concurrent threads running and doing who knows what with the same file descriptors.
My question for SO is, what is the best strategy for solving this problem?
What I have in mind is to subclass SelectReactor
, so that it is a singleton and calls os.fork()
only once, immediately when instantiated. The child process will run in the background and act as a server for the parent (using object serialization over pipes to communicate back and forth). The parent continues to run the application and may use threads as desired. Calls to spawnProcess()
in the parent will be delegated to the child process, which will be guaranteed to have only one thread running and can therefore call os.fork()
safely.
Has anyone done this before? Is there a faster way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
提交票证(可能在 注册)描述问题,最好使用可重现的测试用例(以获得最大的准确性)。然后可以讨论实现它的最佳方法(或多种方法 - 不同的平台可能需要不同的解决方案)。
之前已经提出过立即创建子进程以帮助进一步创建子进程的想法,以解决围绕子进程收获的性能问题。如果这种方法现在解决了两个问题,它就开始看起来更有吸引力了。这种方法的一个潜在困难是
spawnProcess
同步返回一个对象,该对象提供子进程的 PID 并允许向其发送信号。如果存在中间进程,则需要执行更多工作,因为在spawnProcess
返回之前需要将 PID 传回主进程。类似的挑战将是支持childFDs
参数,因为不再可能仅继承子进程中的文件描述符。另一种解决方案(可能更黑客,但也可能具有更少的实现挑战)可能是在调用 os.fork 之前使用非常大的数字调用 sys.setcheckinterval 。 >,然后只在父进程中恢复原来的检查间隔。这应该足以避免进程中的任何线程切换,直到 os.execvpe 发生为止,从而销毁所有额外的线程。这并不完全正确,因为它会使某些资源(例如互斥体和条件)处于不良状态,但是将这些与 deferToThread 一起使用并不常见,所以也许这并不常见影响你的案件。
File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.
The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that
spawnProcess
synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process beforespawnProcess
returns. A similar challenge will be supporting thechildFDs
argument, since it will no longer be possible to merely inherit the file descriptors in the child process.An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call
sys.setcheckinterval
with a very large number before callingos.fork
, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until theos.execvpe
takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these withdeferToThread
isn't very common so maybe that doesn't affect your case.让-保罗在他的回答中给出的建议很好,但这应该有效(并且在大多数情况下确实有效)。
首先,Twisted 也使用线程进行主机名解析,而且我确实在 Twisted 进程中使用了也建立客户端连接的子进程。所以这在实践中是可行的。
其次,
fork()
不会在子进程中创建多个线程。 根据描述fork()
的标准 ,现在,这并不是说
spawnProcess 不存在潜在的多线程问题
;该标准还说:并且我认为没有任何东西可以确保只有异步信号安全使用操作。
因此,请更具体地说明您的确切问题,因为它不是具有被克隆线程的子进程。
The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).
First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.
Second,
fork()
does not create multiple threads in the child process. According to the standard describingfork()
,Now, that's not to say that there are no potential multithreading issues with
spawnProcess
; the standard also says:and I don't think there's anything to ensure that only async-signal-safe operations are used.
So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.
一段时间后回到这个问题,我发现如果我这样做:
reactor.callFromThread(reactor.spawnProcess, *spawnargs)
而不是这样:
reactor.spawnProcess(*spawnargs)
code>然后问题在我的小测试用例中消失了。 Twisted 文档“使用进程”中有一条评论促使我尝试这样做:“Twisted 中的大多数代码都不是线程安全的。例如,从协议向传输写入数据就不是线程安全的。”
我怀疑让-保罗提到的其他人也有这个问题,可能也犯了类似的错误。应用程序有责任强制在正确的线程内进行反应器和其他 API 调用。显然,除了极少数例外,“正确的线程”几乎总是主反应器线程。
Returning to this issue after some time, I found that if I do this:
reactor.callFromThread(reactor.spawnProcess, *spawnargs)
instead of this:
reactor.spawnProcess(*spawnargs)
then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."
I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.
Linux 上的 fork() 肯定会让子进程只剩下一个线程。
我假设您知道,在 Twisted 中使用线程时,线程唯一允许调用的 Twisted API 是 callFromThread?所有其他 Twisted API 只能从主反应器线程调用。
fork() on Linux definitely leaves the child process with only one thread.
I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.