在 Python 中 fork 多个 shell 命令/进程的最佳方法?
我见过的大多数 os.fork 和 subprocess/multiprocessing 模块的示例都展示了如何 fork 调用 python 脚本或一段 python 代码的新实例。同时生成一组任意 shell 命令的最佳方法是什么?
我想,我可以只使用 subprocess.call
或其中一个 Popen
命令并将输出通过管道传输到一个文件,我相信该文件会立即返回,至少返回给调用者。我知道这并不难做到,我只是想找出最简单、最Pythonic 的方法来做到这一点。
提前致谢
Most of the examples I've seen with os.fork
and the subprocess/multiprocessing modules show how to fork a new instance of the calling python script or a chunk of python code. What would be the best way to spawn a set of arbitrary shell command concurrently?
I suppose, I could just use subprocess.call
or one of the Popen
commands and pipe the output to a file, which I believe will return immediately, at least to the caller. I know this is not that hard to do, I'm just trying to figure out the simplest, most Pythonic way to do it.
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对
subprocess.Popen
的所有调用都会立即返回给调用者。这是对wait
和communicate
块的调用。因此,您需要做的就是使用subprocess.Popen
启动多个进程(为了安全起见,将 stdin 设置为 /dev/null),然后一一调用communicate
直到它们全部完成。当然,我假设您只是想启动一堆不相关(即没有通过管道连接在一起)的命令。
All calls to
subprocess.Popen
return immediately to the caller. It's the calls towait
andcommunicate
which block. So all you need to do is spin up a number of processes usingsubprocess.Popen
(set stdin to /dev/null for safety), and then one by one callcommunicate
until they're all complete.Naturally I'm assuming you're just trying to start a bunch of unrelated (i.e. not piped together) commands.
我喜欢使用 PTY 而不是管道。对于一堆我只想捕获错误消息的进程,我这样做了。
下一部分是一个循环,产生大约 30 个进程。
之后,我有一个
select
循环,它收集所有错误消息并将它们发送到单个日志文件。使用 PTY 意味着我永远不必担心部分线条会混淆,因为线条规则提供了简单的框架。I like to use PTYs instead of pipes. For a bunch of processes where I only want to capture error messages I did this.
This next part was in a loop spawning about 30 processes.
After this I had a
select
loop which collected any error messages and sent them to the single log file. Using PTYs meant that I never had to worry about partial lines getting mixed up because the line discipline provides simple framing.没有适合所有可能情况的最佳方案。最好取决于手头的问题。
以下是如何生成进程并将其输出保存到结合 stdout/stderr 的文件中:
生成可以与脚本并行运行且彼此并行运行的多个进程:
注意:
cmd
是一个随处可见的列表。There is no best for all possible circumstances. The best depends on the problem at hand.
Here's how to spawn a process and save its output to a file combining stdout/stderr:
To spawn multiple processes that can run in parallel with your script and each other:
Note:
cmd
is a list everywhere.如果您想处理数据,这不是一个好方法。
在这种情况下,最好
先执行
sp.communicate()
或直接从sp.stdout.read()
读取。如果稍后要在调用程序中处理数据,有两种方法:
您可以尽快检索数据,也许通过单独的线程,读取它们并将它们存储在消费者可以获取它们的地方.
您可以让生产子进程在需要时阻止并从中检索数据。子进程会生成管道缓冲区(通常为 64 kiB)中容纳的尽可能多的数据,然后阻止进一步的写入。一旦需要数据,您就可以从子进程对象的 stdout(也可能是 stderr)中读取并使用它们 - 或者,稍后再次使用
sp.communicate()
。如果生成数据需要很长时间,那么您的 wprogram 就必须等待,则采用方法 1。
如果数据量很大和/或数据生成速度太快以至于缓冲没有意义,则首选方式 2。
That's not a good way to do it if you want to process the data.
In this case, better do
and then
sp.communicate()
or read directly fromsp.stdout.read()
.If the data shall be processed in the calling program at a later time, there are two ways to go:
You can retrieve the data ASAP, maybe via a separate thread, reading them and storing them somewhere where the consumer can get them.
You can have the producing subprocess have block and retrieve the data from it when you need them. The subprocess produces as many data as fit in the pipe buffer (usually 64 kiB) and then blocks on further writes. As soon as you need the data, you
read()
from the subprocess object'sstdout
(maybestderr
as well) and use them - or, again, you usesp.communicate()
at that later time.Way 1 would the way to go if producing the data needs much time, so that your wprogram would have to wait.
Way 2 would be to be preferred if the size of the data is quite huge and/or the data is produced so fast that buffering would make no sense.
请参阅我的旧答案,包括代码片段 要做的事情:
为了生成多个并发命令,您需要更改类 RunCmd 以实例化多个读输出/写输入队列并生成多个 Popen 子进程。
See an older answer of mine including code snippets to do:
For spawning multiple concurrent commands, you would need to alter the class RunCmd to instantiate mutliple read output/write input queues and to spawn mutliple Popen subprocesses.