如何在 python 中有效地运行大量子进程?
基本设置:
我正在使用 python 脚本来自动测试我正在处理的编程项目。在测试中,我使用许多不同的选项运行可执行文件,并将结果与以前的运行进行比较。由于我要运行大约 60 万个不同的测试,因此测试需要相当多的时间。
目前,我已将脚本分为两部分:一个测试模块,用于从作业队列中获取测试并将结果放入结果队列中;一个主模块用于创建作业队列,然后检查结果。这使我可以尝试使用多个测试进程/线程,但到目前为止,测试速度没有任何改进(我在双核计算机上运行它,我希望更多的测试进程在四核计算机上运行得更好) -核)。
在测试模块中,我创建一个命令字符串,然后使用它执行,
subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
然后从管道读取结果并将其放入结果队列中。
问题:
这是在多核系统上运行大量命令字符串的最有效方法吗?我所做的每个 Popen 都会创建一个新进程,这似乎可能会产生相当多的开销,但我真的想不出更好的方法来做到这一点。
(我目前正在使用 python 2.7,以防万一这很重要。)
编辑:
操作系统是Linux
我生成的子进程是带有参数的命令行 C 可执行文件。
Basic setup:
I am using a python script for automatic testing of a programming project that I am working on. In the test, I run my executable with lots of different options and compare the result with previous runs. The testing takes quite a lot of time since I have roughly 600k different tests to run.
At the moment, I have split my script into two parts, a test-module that grabs tests from a job-queue and places results in a result-queue, and a main-module that creates the job-queue and then checks the results. This allows me to play around with using several test-processes/threads which so far has not given any improvement in testing speed (I am running this on a dual-core computer, I would expect more test-processes to work better on a quad-core).
In the test module, I create a command string that I then execute using
subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
I then read the results from the pipe and place it in the result-queue.
Question:
Is this the most efficient way of running lots and lots of command strings on a multi-core system? Every Popen I do creates a new process, which seems like it might create quite a bit of overhead, but I can't really think of a better way to do it.
(I am currently using python 2.7 in case this matters.)
EDIT:
OS is Linux
The subprocesses that I spawn are commandline C-executables with arguments.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以查看 multiprocessing 模块,尤其是池部分。
它将允许您根据需要启动进程(默认与 CPU 核心数量相同)。
You could have a look to mulitprocessing module, especially the Pool part.
It will allow you to launch as processes as you want (default as many as CPU cores).
首先,尝试使用空可执行文件来测量测试脚本/方案。这样您就可以看到进程生成相对于实际测试时间有多少开销。然后我们就有了一些可以采取行动的真实数据。
如果工作量与加载和关闭进程所需的时间相比较小,则向 exe 添加批处理模式(从文件中读取命令行并执行该操作)可能是一个好主意。另外,它还可以帮助您发现内存泄漏。 :)
通过将内容移出 main(),这并不难做到。
First, try measuring the testing script/scheme with a null-executable. That way you can see how much overhead the process spawning has w.r.t. actual testing time. Then we have some real data to act on.
Adding a batch mode to your exe (that reads command lines off a file and does that work) is probably a good idea if the amount of work is small compared to the time it takes to load and shut down your process. Plus, it will help you find memory leaks. :)
By moving stuff out of main(), this isn't so hard to do.
最后,我直接为我想要测试的代码创建了 python C 绑定(使用 SWIG)。事实证明,它比启动子进程快数百倍。
In the end I created python C-bindings (with SWIG) directly to the code that I wanted to test. It turned out to be several hundred times faster than starting subprocesses.