在 Python 中 fork 多个 shell 命令/进程的最佳方法?

发布于 2024-12-14 14:03:06 字数 290 浏览 0 评论 0原文

我见过的大多数 os.fork 和 subprocess/multiprocessing 模块的示例都展示了如何 fork 调用 python 脚本或一段 python 代码的新实例。同时生成一组任意 shell 命令的最佳方法是什么?

我想,我可以只使用 subprocess.call 或其中一个 Popen 命令并将输出通过管道传输到一个文件,我相信该文件会立即返回,至少返回给调用者。我知道这并不难做到,我只是想找出最简单、最Pythonic 的方法来做到这一点。

提前致谢

Most of the examples I've seen with os.fork and the subprocess/multiprocessing modules show how to fork a new instance of the calling python script or a chunk of python code. What would be the best way to spawn a set of arbitrary shell command concurrently?

I suppose, I could just use subprocess.call or one of the Popen commands and pipe the output to a file, which I believe will return immediately, at least to the caller. I know this is not that hard to do, I'm just trying to figure out the simplest, most Pythonic way to do it.

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

驱逐舰岛风号 2024-12-21 14:03:06

subprocess.Popen 的所有调用都会立即返回给调用者。这是对 waitcommunicate 块的调用。因此,您需要做的就是使用 subprocess.Popen 启动多个进程(为了安全起见,将 stdin 设置为 /dev/null),然后一一调用 communicate直到它们全部完成。

当然,我假设您只是想启动一堆不相关(即没有通过管道连接在一起)的命令。

All calls to subprocess.Popen return immediately to the caller. It's the calls to wait and communicate which block. So all you need to do is spin up a number of processes using subprocess.Popen (set stdin to /dev/null for safety), and then one by one call communicate until they're all complete.

Naturally I'm assuming you're just trying to start a bunch of unrelated (i.e. not piped together) commands.

故事与诗 2024-12-21 14:03:06

我喜欢使用 PTY 而不是管道。对于一堆我只想捕获错误消息的进程,我这样做了。

RNULL = open('/dev/null', 'r')
WNULL = open('/dev/null', 'w')
logfile = open("myprocess.log", "a", 1)
REALSTDERR = sys.stderr
sys.stderr = logfile

下一部分是一个循环,产生大约 30 个进程。

sys.stderr = REALSTDERR
master, slave = pty.openpty()
self.subp = Popen(self.parsed, shell=False, stdin=RNULL, stdout=WNULL, stderr=slave)
sys.stderr = logfile

之后,我有一个 select 循环,它收集所有错误消息并将它们发送到单个日志文件。使用 PTY 意味着我永远不必担心部分线条会混淆,因为线条规则提供了简单的框架。

I like to use PTYs instead of pipes. For a bunch of processes where I only want to capture error messages I did this.

RNULL = open('/dev/null', 'r')
WNULL = open('/dev/null', 'w')
logfile = open("myprocess.log", "a", 1)
REALSTDERR = sys.stderr
sys.stderr = logfile

This next part was in a loop spawning about 30 processes.

sys.stderr = REALSTDERR
master, slave = pty.openpty()
self.subp = Popen(self.parsed, shell=False, stdin=RNULL, stdout=WNULL, stderr=slave)
sys.stderr = logfile

After this I had a select loop which collected any error messages and sent them to the single log file. Using PTYs meant that I never had to worry about partial lines getting mixed up because the line discipline provides simple framing.

儭儭莪哋寶赑 2024-12-21 14:03:06

没有适合所有可能情况的最佳方案。最好取决于手头的问题。

以下是如何生成进程并将其输出保存到结合 stdout/stderr 的文件中:

import subprocess
import sys

def spawn(cmd, output_file):
    on_posix = 'posix' in sys.builtin_module_names
    return subprocess.Popen(cmd, close_fds=on_posix, bufsize=-1,
                            stdin=open(os.devnull,'rb'),
                            stdout=output_file,
                            stderr=subprocess.STDOUT)

生成可以与脚本并行运行且彼此并行运行的多个进程:

processes, files = [], []
try:
    for i, cmd in enumerate(commands):
        files.append(open('out%d' % i, 'wb'))
        processes.append(spawn(cmd, files[-1]))
finally:
    for p in processes:
        p.wait()
    for f in files: 
        f.close()

注意:cmd 是一个随处可见的列表。

There is no best for all possible circumstances. The best depends on the problem at hand.

Here's how to spawn a process and save its output to a file combining stdout/stderr:

import subprocess
import sys

def spawn(cmd, output_file):
    on_posix = 'posix' in sys.builtin_module_names
    return subprocess.Popen(cmd, close_fds=on_posix, bufsize=-1,
                            stdin=open(os.devnull,'rb'),
                            stdout=output_file,
                            stderr=subprocess.STDOUT)

To spawn multiple processes that can run in parallel with your script and each other:

processes, files = [], []
try:
    for i, cmd in enumerate(commands):
        files.append(open('out%d' % i, 'wb'))
        processes.append(spawn(cmd, files[-1]))
finally:
    for p in processes:
        p.wait()
    for f in files: 
        f.close()

Note: cmd is a list everywhere.

冷夜 2024-12-21 14:03:06

我想,我可以只使用 subprocess.call 或 Popen 之一
命令并将输出通过管道传输到文件,我相信该文件会返回
立即,至少对呼叫者来说。

如果您想处理数据,这不是一个好方法。

在这种情况下,最好

sp = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)

先执行 sp.communicate() 或直接从 sp.stdout.read() 读取。

如果稍后要在调用程序中处理数据,有两种方法:

  1. 您可以尽快检索数据,也许通过单独的线程,读取它们并将它们存储在消费者可以获取它们的地方.

  2. 您可以让生产子进程在需要时阻止并从中检索数据。子进程会生成管道缓冲区(通常为 64 kiB)中容纳的尽可能多的数据,然后阻止进一步的写入。一旦需要数据,您就可以从子进程对象的 stdout(也可能是 stderr)中读取并使用它们 - 或者,稍后再次使用 sp.communicate()

如果生成数据需要很长时间,那么您的 wprogram 就必须等待,则采用方法 1。

如果数据量很大和/或数据生成速度太快以至于缓冲没有意义,则首选方式 2。

I suppose, I could just us subprocess.call or one of the Popen
commands and pipe the output to a file, which I believe will return
immediately, at least to the caller.

That's not a good way to do it if you want to process the data.

In this case, better do

sp = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)

and then sp.communicate() or read directly from sp.stdout.read().

If the data shall be processed in the calling program at a later time, there are two ways to go:

  1. You can retrieve the data ASAP, maybe via a separate thread, reading them and storing them somewhere where the consumer can get them.

  2. You can have the producing subprocess have block and retrieve the data from it when you need them. The subprocess produces as many data as fit in the pipe buffer (usually 64 kiB) and then blocks on further writes. As soon as you need the data, you read() from the subprocess object's stdout (maybe stderr as well) and use them - or, again, you use sp.communicate() at that later time.

Way 1 would the way to go if producing the data needs much time, so that your wprogram would have to wait.

Way 2 would be to be preferred if the size of the data is quite huge and/or the data is produced so fast that buffering would make no sense.

初熏 2024-12-21 14:03:06

请参阅我的旧答案,包括代码片段 要做的事情:

  • 使用进程而不是线程来阻塞 I/O,因为它们可以更可靠地被 p.terminate()
  • 实现一个可重新触发的超时看门狗,只要发生某些输出,它就会重新开始计数
  • 实现一个长期超时看门狗以限制总体运行时间
  • 可以输入标准输入(尽管我只需要输入一次性短字符串)
  • 可以以通常的 Popen 方式捕获标准输出/标准错误(仅对标准输出进行编码,并将标准错误重定向到标准输出 ) ;但可以很容易地分开)
  • 它几乎是实时的,因为它只每 0.2 秒检查一次输出。但是您可以减少此时间或轻松删除等待间隔
  • 。仍然启用大量调试打印输出以查看何时发生的情况。

为了生成多个并发命令,您需要更改类 RunCmd 以实例化多个读输出/写输入队列并生成多个 Popen 子进程。

See an older answer of mine including code snippets to do:

  • Uses processes not threads for blocking I/O because they can more reliably be p.terminated()
  • Implements a retriggerable timeout watchdog that restarts counting whenever some output happens
  • Implements a long-term timeout watchdog to limit overall runtime
  • Can feed in stdin (although I only need to feed in one-time short strings)
  • Can capture stdout/stderr in the usual Popen means (Only stdout is coded, and stderr redirected to stdout; but can easily be separated)
  • It's almost realtime because it only checks every 0.2 seconds for output. But you could decrease this or remove the waiting interval easily
  • Lots of debugging printouts still enabled to see whats happening when.

For spawning multiple concurrent commands, you would need to alter the class RunCmd to instantiate mutliple read output/write input queues and to spawn mutliple Popen subprocesses.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文