多个管道命令的Python子过程性能

发布于 2025-02-07 11:53:34 字数 681 浏览 4 评论 0 原文

我正在使用子过程模块编写Python代码，并且在这种情况下，我需要使用管道将Commnad的结果传递给另一个，以获取我需要的特定数据。

但是，这也可以通过纯Python代码来实现。

例如）

from subprocess import Popen
cmd_result = Popen('ls -l ./ | awk -F " " \'{if ($5 > 10000) print $0}\'' | grep $USER', shell=True).communicate().split('\n')

或

cmd_result = Popen('ls -l ./', shell=True).communicate().split('\n')
result_lst = []
for result in cmd_result:
    result_items = result.split()
    if int(result_item[4]) > 10000 and result_item[2] == "user_name":
        result_lst.append(result)

我想知道哪种方法比其他方法在效率方面都要好。我发现具有纯Python代码的一个比带管道的代码慢，但不确定是否意味着使用管道更有效。

先感谢您。

原文

I was writing a python code using subprocess module and I got stuck in this situation where I need to use pipes to pass a result of a commnad to another to obtain specific data I need.

However, this also can be achieved through pure Python code.

Ex)

from subprocess import Popen
cmd_result = Popen('ls -l ./ | awk -F " " \'{if ($5 > 10000) print $0}\'' | grep $USER', shell=True).communicate().split('\n')

cmd_result = Popen('ls -l ./', shell=True).communicate().split('\n')
result_lst = []
for result in cmd_result:
    result_items = result.split()
    if int(result_item[4]) > 10000 and result_item[2] == "user_name":
        result_lst.append(result)

And I am wondering which method is better than the other in efficiency-wise.
I found that the one with pure python code is slower than the one with pipelines, but not sure if that means using pipes is more efficient.

Thank you in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怂人 2025-02-14 11:53:34

绝对最好的解决方案是避免使用子过程。

import os

myuid = os.getuid()

for file in os.scandir("."):
    st = os.stat(file)
    if st.st_size > 10000 and st.st_uid == myuid:
        print(file)

通常，如果要运行并捕获命令的输出，那么最简单的是 subprocess.check_output ;但实际上， nove ls output 当然，以及，当然还有尽量避免多余的子过程，例如无用的 Grep> Grep> Grep s s s 如果效率很重要。

files = subprocess.check_output(
    """ls -l . | awk -v me="$USER" '$5 > 10000 && $2 == me { print $9 }'""",
    text=True, shell=True)

这还有其他几个问题。 $ 4 可以包含空格（它在我的系统上确实可以），而 $ 9 只需包含文件名的开头，如果它包含空格。

如果您需要运行一个过程，该过程可能会同时产生大量输出并在其到达时获取其输出，而不是该过程完成时，标签信息页面有几个有关如何执行此操作的问题的链接；我猜想，对于您要问的这项简单任务，这对更复杂的任务很有用，这是不值得的。

The absolutely best solution to this is to avoid using a subprocess at all.

import os

myuid = os.getuid()

for file in os.scandir("."):
    st = os.stat(file)
    if st.st_size > 10000 and st.st_uid == myuid:
        print(file)

In general, if you want to run and capture the output of a command, the simplest by far is subprocess.check_output; but really, don't parse ls output, and, of course, try to avoid superfluous subprocesses like useless greps if efficiency is important.

files = subprocess.check_output(
    """ls -l . | awk -v me="$USER" '$5 > 10000 && $2 == me { print $9 }'""",
    text=True, shell=True)

This has several other problems; $4 could contain spaces (it does, on my system) and $9 could contain just the beginning of the file name if it contains spaces.

If you need to run a process which could produce a lot of output concurrently and fetch its output as it arrives, not when the process has finished, the Stack Overflow subprocess tag info page has a couple of links to questions about how to do that; I am guessing it is not worth the effort for this simple task you are asking about, though it could be useful for more complex ones.