Python Popen.communicate() 内存限制的替代方案?

发布于 2024-11-26 23:10:34 字数 656 浏览 1 评论 0原文

我有以下 Python 代码块(运行 v2.7),当我处理大型(几个 GB)文件时,会导致抛出 MemoryError 异常:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

在读取 Popen.communicate() 的文档,似乎正在进行一些缓冲:

注意读取的数据会缓存在内存中,因此如果数据量较大或无限制,请勿使用此方法。

有没有办法禁用此缓冲,或者在进程运行时强制定期清除缓存?

我应该在 Python 中使用什么替代方法来运行将千兆字节数据流式传输到 stdout 的命令?

我应该注意,我需要处理输出和错误流。

I have the following chunk of Python code (running v2.7) that results in MemoryError exceptions being thrown when I work with large (several GB) files:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

In reading the documentation to Popen.communicate(), there appears to be some buffering going on:

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?

What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout?

I should note that I need to handle output and error streams.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

眼中杀气 2024-12-03 23:10:34

我想我找到了一个解决方案:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

这似乎使我的内存使用量降低到足以完成任务。

更新

我最近发现了一种在Python中处理数据流的更灵活的方法,使用线程。有趣的是,Python 的能力是如此之差,而 shell 脚本却可以轻松做到这一点!

I think I found a solution:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

This seems to get my memory usage down enough to get through the task.

Update

I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!

七禾 2024-12-03 23:10:34

如果我需要读取这么大的东西的标准输出,我可能会做的就是在创建进程时将其发送到文件。

with open(my_large_output_path, 'w') as fo:
    with open(my_large_error_path, 'w') as fe:
        myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)

编辑:如果您需要流式传输,您可以尝试创建一个类似文件的对象并将其传递给 stdout 和 stderr。 (不过,我还没有尝试过。)然后,您可以在写入对象时从该对象中读取(查询)。

What I would probably do instead, if I needed to read the stdout for something that large, is send it to a file on creation of the process.

with open(my_large_output_path, 'w') as fo:
    with open(my_large_error_path, 'w') as fe:
        myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)

Edit: If you need to stream, you could try making a file-like object and passing it to stdout and stderr. (I haven't tried this, though.) You could then read (query) from the object as it's being written.

云之铃。 2024-12-03 23:10:34

对于那些在使用 Popen 时应用程序在一定时间后挂起的用户,请查看以下我的案例:

经验法则,如果您不打算使用 stderrstdout 流,然后不要在 Popen 的参数中传递/初始化它们!因为它们会填满并给你带来很多问题。

如果您在一定时间内需要它们并且需要保持进程运行,那么您可以随时关闭这些流。

try:
    p = Popen(COMMAND, stdout=PIPE, stderr=PIPE)
    # After using stdout and stderr
    p.stdout.close()
    p.stderr.close()
except Exception as e:
    pass

For those whose application hangs after a certain amount of time when using Popen, please look for my case below:

A Rule of thumb, if you're not going to use stderr and stdout streams then don't pass/init them in the parameters of Popen! because they will fill up and cause you a lot of problems.

If you need them for a certain amount of time and you need to keep the process running, then you can close those streams at any time.

try:
    p = Popen(COMMAND, stdout=PIPE, stderr=PIPE)
    # After using stdout and stderr
    p.stdout.close()
    p.stderr.close()
except Exception as e:
    pass
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文