更好的多线程使用Python subprocess.Popen &交流（）？

发布于 2024-10-04 13:30:07 字数 1668 浏览 8 评论 0原文

我正在运行多个命令，在运行 Python 2.6 的 Linux 机器上并行运行可能需要一些时间。

因此，我使用 subprocess.Popen 类和 process.communicate() 方法来并行执行多个命令组，并在执行后立即捕获输出。

def run_commands(commands, print_lock):
    # this part runs in parallel.
    outputs = []
    for command in commands:
        proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
        output, unused_err = proc.communicate()  # buffers the output
        retcode = proc.poll()                    # ensures subprocess termination
        outputs.append(output)
    with print_lock: # print them at once (synchronized)
        for output in outputs:
            for line in output.splitlines():
                print(line)

在其他地方，它是这样调用的：

processes = []
print_lock = Lock()
for ...:
    commands = ...  # a group of commands is generated, which takes some time.
    processes.append(Thread(target=run_commands, args=(commands, print_lock)))
    processes[-1].start()
for p in processes: p.join()
print('done.')

预期的结果是一组命令的每个输出立即显示，而它们的执行是并行完成的。

但是从第二个输出组（当然，由于调度不确定性，成为第二个的线程发生了变化），它开始打印而不换行并添加空格，其数量与前一行打印的字符数相同，并且输入回显被翻转off——最终状态为“乱码”或“崩溃”。（如果我发出 reset shell 命令，它就会恢复正常。）

起初，我尝试从 '\r' 的处理中查找原因，但并不是原因。正如您在我的代码中看到的，我使用 splitlines() 正确处理了它，并且我确认了将 repr() 函数应用于输出。

我认为原因是在 stdout/stderr 的 Popen 和 communicate() 中同时使用管道。我尝试了Python 2.7中的check_output快捷方法，但没有成功。当然，如果我序列化所有命令执行和打印，就不会出现上述问题。

有没有更好的方法来并行处理 Popen 和 communicate() ？

原文

I'm running multiple commands which may take some time, in parallel, on a Linux machine running Python 2.6.

So, I used subprocess.Popen class and process.communicate() method to parallelize execution of mulitple command groups and capture the output at once after execution.

def run_commands(commands, print_lock):
    # this part runs in parallel.
    outputs = []
    for command in commands:
        proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
        output, unused_err = proc.communicate()  # buffers the output
        retcode = proc.poll()                    # ensures subprocess termination
        outputs.append(output)
    with print_lock: # print them at once (synchronized)
        for output in outputs:
            for line in output.splitlines():
                print(line)

At somewhere else it's called like this:

processes = []
print_lock = Lock()
for ...:
    commands = ...  # a group of commands is generated, which takes some time.
    processes.append(Thread(target=run_commands, args=(commands, print_lock)))
    processes[-1].start()
for p in processes: p.join()
print('done.')

The expected result is that each output of a group of commands is displayed at once while execution of them is done in parallel.

But from the second output group (of course, the thread that become the second is changed due to scheduling indeterminism), it begins to print without newlines and adding spaces as many as the number of characters printed in each previous line and input echo is turned off -- the terminal state is "garbled" or "crashed". (If I issue reset shell command, it restores normal.)

At first, I tried to find the reason from handling of '\r', but it was not the reason. As you see in my code, I handled it properly using splitlines(), and I confirmed that with repr() function applied to the output.

I think the reason is concurrent use of pipes in Popen and communicate() for stdout/stderr. I tried check_output shortcut method in Python 2.7, but no success. Of course, the problem described above does not occur if I serialize all command executions and prints.

Is there any better way to handle Popen and communicate() in parallel?

分享到QQ

分享到微博