如何运行两个 python 子进程并获取它们的运行时和标准输出、标准错误?
我正在考虑在一台机器上同时运行两个子进程,以便获得准确的运行时间。我正在比较两个版本的软件并对它们运行诊断,例如运行时、输出变化等。
最初,我有一个函数在同一输入文件上使用两个版本的软件,并且每个输出到不同的位置。新旧版本是通过 argparser 获取的。该函数对每个软件命令使用一个子进程,然后通过 .communicate() 获取输出。但我知道 .communicate() 等待进程完成,理想情况下我想同时在两个进程上使用 .communicate() ,以便它们同时启动,并且是定时的,并且每当只要我知道他们的运行时间。
更简洁地说,我的问题是如何运行两个子进程,每个子进程单独运行并同时启动。然后获取它们的运行时和标准输出、标准错误?
这是我的函数的一个简单示例(假设我在某些文件上测试 java 的速度):
def test():
# Get start time
before = time.time()
cmd1 = ['java-1.0', 'blah']
c1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
cmd2 = ['java-1.5', 'blah']
c2 = subprocess.Popen(cmd2, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Start both processes at the same time??
results = [c1.communicate(), c2.communicate()]
# Get total time taken
total = round(time.time()-before, 2)
# Print out the total time (I know its messy but its accurate)
print "%s:%s:%s" % (int(total/60/60), int(total/60), int(total))
c1.stderr.close()
c2.stderr.close()
return results
我想要指出的另一点是我需要它们同时运行,因为如果我在功能强大的远程计算机上运行作业(我将是)然后我需要在运行作业时工作负载相同,以便一个进程不会仅仅因为它在不同的时间运行而更快地完成。
I was looking into running two subprocesses at the same time on one machine so that I can get accurate runtimes. I am comparing two versions of software and running diagnostics on them, such as runtime, output variation, etc.
Originally I had one function that uses both versions of the software on the same input file, and each outputs to a different place. The old and new versions are grabbed through an argparser. The function uses a subprocess for each software command, and then the output is grabbed through .communicate(). But I know that .communicate() waits for the process to finish, ideally I want to use .communicate() on both processes at the same time so that they start at the same time, and are timed, and will give me results whenever as long as I know their runtimes.
My question, more concisely, is how can I run two subprocesses each running individually and starting at the same time. And then grab their runtimes and stdout, stderr?
Heres a quick example of my function (just pretend im testing speeds of java on some file):
def test():
# Get start time
before = time.time()
cmd1 = ['java-1.0', 'blah']
c1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
cmd2 = ['java-1.5', 'blah']
c2 = subprocess.Popen(cmd2, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Start both processes at the same time??
results = [c1.communicate(), c2.communicate()]
# Get total time taken
total = round(time.time()-before, 2)
# Print out the total time (I know its messy but its accurate)
print "%s:%s:%s" % (int(total/60/60), int(total/60), int(total))
c1.stderr.close()
c2.stderr.close()
return results
Another point I want to make is that I need them to run at the same time because if I am running a job on a powerful remote machine (which I will be) then I need the workload to be the same while running the jobs so that one process doesn't finish faster just because it ran at a different time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在这种情况下,由于您没有输入可发送到进程,因此不需要使用
Popen.communicate()
。 Popen 调用本身会启动该进程,我们不需要Popen.communicate()
为此。您可以使用Popen.poll()
来检查进程是否完成,而不是使用 Popen.communicate()。如果您想要 stdout 和 stderr,您可以生成 2 个线程,然后使用
Popen.communicate()
并记录时间。您可以使用这样的东西作为线程:
要生成线程,请调用
In this case, since you have no input to send to the process you don't need to use
Popen.communicate()
. The Popen call itself starts the process, we do not needPopen.communicate()
for that. You can usePopen.poll()
to check if a process is finished instead of using Popen.communicate().If you want stdout and stderr, you can spawn 2 threads and then use
Popen.communicate()
and record the time.You can use something like this as a thread:
To spawn the threads, call
如果您想获得准确的运行时间,则需要多次运行测试并查看分布——中值运行时间应该是一个很好的指标。同时运行它们不会帮助您获得更准确的结果。
If you are trying to get accurate runtimes, you'll need to run the test many times and look at the distribution -- the median runtime should be a good indicator. Running both of them at the same time won't help you get more accurate results.
运行时测试通常在空闲的计算机上进行,并在使用和不使用不同输入文件的情况下重复多次,以了解操作系统缓存的影响。
让两个进程争夺相同的资源不会导致更准确的测量。
加速 Python 程序通常需要使用正确的工具。例如,列表和其他理解通常比循环更快。使用内置函数可能会胜过这一点。一个很好的例子是“优化轶事”。也许最好的优化是不同的算法。
使用 PyPy 代替 CPython 也可能会产生显着的改进。
Runtimes testing is usually done on an otherwise idle machine and repeated several times with and without different input files to get an idea what the influence of the OS caches are.
Having two processes vying for the same resources will not result in more accurate measurements.
Speeding up Python programs is usually a case of using the right tools. E.g. list- and other comprehensions are usually faster than loops. And using built-in functions probably trumps that. A nice example is "an optimization anecdote". Probably the best optimization is a different algorithm.
Using PyPy instead of CPython might also yield significant improvement.