在一个线程中运行繁忙任务时所有线程都会挂起
我有一个多线程 python 应用程序,其中生成线程来执行各种任务。该应用程序几个月来一直运行良好,但最近我遇到了问题。
其中一个线程启动一个 python subprocess.Popen
对象,该对象正在运行密集型数据复制命令。
copy = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, preexec_fn = os.setsid, shell = False, close_fds = True)
if copy.wait():
raise Exception("Unable to copy!")
当复制命令运行时,整个应用程序最终陷入困境,我的其他线程一次都没有运行几分钟。一旦复制
完成,一切都会从上次中断的地方恢复。
我正在尝试找出如何防止这种情况发生。我最好的理论 ATM 是它与我的内核调度进程的方式有关。我添加了对 setsid()
的调用,以获取与主 python 应用程序分开安排的复制过程,但这没有效果。
我假设所有 copy.wait()
函数所做的都是 waitpid()
。是否有可能调用需要很长时间,在此期间某个线程持有 GIL?如果是这样,我该如何预防/处理这个问题?我可以做什么来进一步调试这个问题?
I have a multi-threaded python application where threads are spawned off to do various tasks. This application has been working great for months, but recently I've run into a problem.
One of the threads starts a python subprocess.Popen
object which is running an intensive data copy command.
copy = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, preexec_fn = os.setsid, shell = False, close_fds = True)
if copy.wait():
raise Exception("Unable to copy!")
While the copy command is running, the entire application eventually bogs down, with none of my other threads running for minutes at a time. Once copy
finishes, everything resumes where it left off.
I'm trying to figure out how to prevent this from happening. My best theory ATM is that it has something to do with the way my kernel is scheduling processes. I added the call to setsid()
to get the copy process scheduled separately from the main python app, but this has no effect.
I'm assuming all the copy.wait()
function does is a waitpid()
. Is it possible that the call takes a long time, during which that one thread holds the GIL? If so, how do I prevent/deal with this? What can I do to debug this further?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
copy.wait()
持有 GIL 也是我的第一个怀疑。但是,在我的系统上似乎并非如此(wait()
调用不会阻止其他线程继续进行)。你是对的,
copy.wait()
最终以os.waitpid()
结束。后者在我的 Linux 系统上看起来像这样:这清楚地释放了 GIL,同时它在 POSIX
waitpid
中被阻止。当
gdb
挂起时,我会尝试将其附加到python
进程,以查看线程正在做什么。也许这可以提供一些想法。编辑这就是多线程Python进程在
gdb
中的样子:这里,除了两个之外的所有线程都在等待GIL。典型的堆栈跟踪如下所示:
您可以通过在 Python 代码中打印
hex(t.ident)
来确定哪个线程是哪个线程,其中t
是threading.Thread
对象。在我的系统上,这与 gdb 中看到的线程 ID(0x7f82c6462700
等)相匹配。copy.wait()
holding the GIL was my first suspicion too. However, this doesn't appear to be the case on my system (await()
call isn't preventing other threads from progressing).You are right that
copy.wait()
eventually ends up inos.waitpid()
. The latter looks like this on my Linux system:This clearly releases the GIL while it's blocked in POSIX
waitpid
.I would try attaching
gdb
to thepython
process when it's hung to see what the threads are doing. Perhaps this would provide some ideas.edit This is what a multi-threaded Python process looks like in
gdb
:Here, all threads but two are waiting for the GIL. A typical stack trace goes like this:
You can figure out which thread is which by printing
hex(t.ident)
in your Python code, wheret
is athreading.Thread
object. On my system, this matches up with the thread ids seen ingdb
(0x7f82c6462700
et al).