Python 的线程卡在 Popen.communicate 中
我有一个使用线程的Python应用程序,如下所示:
- 创建任务队列(队列模块)
- 创建10个线程并传递给每个队列对象
- 将任务放入队列(总共大约8500个任务)
- 每个线程:
- 使用 Popen.communicate() 接受任务并运行一些 Linux 命令
互斥体、关键部分、队列管理 - 我的线程池库已经在几个较小的项目中进行了测试,所以没有理由认为某件事在那里被破坏了......
当我有多达几千个任务时,一切正常,但是当我有更多任务时(在本例中超过 8500 个),某些线程会挂起。 gdb 显示它们被困在 python 的 subprocess.py 的 _execute child 中(第 1131 行)->表示在调用os.fork()之后。
gdb:
(gdb) pystack
/opt/python/current/lib/python2.7/subprocess.py (1131): _execute_child
/opt/python/current/lib/python2.7/subprocess.py (681): __init__
/home/olibsup/tools/chelo/checks/checkUtils/osutils/cmdutils.py (115): shcmd
/home/olibsup/tools/chelo/checks/liblist/libWorkers.py (204): workerFunction
/home/olibsup/tools/chelo/checks/checkUtils/pools/thpool.py (160): run
/opt/python/current/lib/python2.7/threading.py (160): __bootstrap_inner
/opt/python/current/lib/python2.7/threading.py (553): __bootstrap
我的 ulimit 显示:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 139264
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
“top”也没有显示任何可疑内容(至少对我来说没有):
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16438200k total, 15705272k used, 732928k free, 751640k buffers
Swap: 3148700k total, 44k used, 3148656k free, 11692300k cached
您知道为什么线程挂在那里吗?
并非所有线程都挂起,有些线程(10 个线程中的 5 个)已正确完成(当没有更多任务可用时)。
谢谢您的帮助,
兹比格涅夫
I have a Python application that uses threads as follows:
- Create tasks queue (Queue module)
- Create 10 threads and pass to each a queue object
- Put tasks into queue (around 8500 tasks in total)
- Each thread:
- takes a task and runs some Linux commands using Popen.communicate()
Mutexes, critical sections, queue management - my thread's pool library was tested already in couple smaller projects, so no reason to think sth is spoiled there...
Everything works fine when I have up to couple thousand tasks, however when I have more (in this case over 8500), some of the threads hang. gdb shows they're stuck in python's subprocess.py in _execute child (line 1131) -> means just after os.fork() is called.
gdb:
(gdb) pystack
/opt/python/current/lib/python2.7/subprocess.py (1131): _execute_child
/opt/python/current/lib/python2.7/subprocess.py (681): __init__
/home/olibsup/tools/chelo/checks/checkUtils/osutils/cmdutils.py (115): shcmd
/home/olibsup/tools/chelo/checks/liblist/libWorkers.py (204): workerFunction
/home/olibsup/tools/chelo/checks/checkUtils/pools/thpool.py (160): run
/opt/python/current/lib/python2.7/threading.py (160): __bootstrap_inner
/opt/python/current/lib/python2.7/threading.py (553): __bootstrap
My ulimit shows:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 139264
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
'top' also doesn't show anything suspicious (at least not to me):
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16438200k total, 15705272k used, 732928k free, 751640k buffers
Swap: 3148700k total, 44k used, 3148656k free, 11692300k cached
Do you have any ideas why threads hang there?
Not all threads hang, some (5 of 10) have finished properly (when no more tasks were available).
Thank you for your help,
Zbigniew
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 subprocess.py 的第 1131/1132 行,文件描述符通过
os.dup
复制。因此,我怀疑您的操作系统限制了应用程序可用的子进程数量和/或文件描述符数量。但是,我不明白为什么 os.dup 在这种情况下不会引发/抛出异常。
尝试找出您的操作系统的限制并保持在该限制以下。对于基于 UNIX 的系统,您可能可以使用 Python 的资源模块(尽管我自己从未使用过): http ://docs.python.org/library/resource.html
In subprocess.py at line 1131/1132, a file descriptor is duplicated via
os.dup
.For this reason I suspect your operating system is limiting the number of subprocesses and/or the number of file descriptors available to your application. However, I do not understand why os.dup doesn't raise/throw an exception in that case.
Try to find out your operating system's limit and stay below that limit. For UNIX-based systems you can probably use Python's resource module (though I have never used it myself): http://docs.python.org/library/resource.html