Python Popen.communicate() 内存限制的替代方案？

发布于 2024-11-26 23:10:34 字数 656 浏览 1 评论 0原文

我有以下 Python 代码块（运行 v2.7），当我处理大型（几个 GB）文件时，会导致抛出 MemoryError 异常：

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

在读取 Popen.communicate() 的文档，似乎正在进行一些缓冲：

注意读取的数据会缓存在内存中，因此如果数据量较大或无限制，请勿使用此方法。

有没有办法禁用此缓冲，或者在进程运行时强制定期清除缓存？

我应该在 Python 中使用什么替代方法来运行将千兆字节数据流式传输到 stdout 的命令？

我应该注意，我需要处理输出和错误流。

原文

I have the following chunk of Python code (running v2.7) that results in MemoryError exceptions being thrown when I work with large (several GB) files:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

In reading the documentation to Popen.communicate(), there appears to be some buffering going on:

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?

What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout?

I should note that I need to handle output and error streams.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼中杀气 2024-12-03 23:10:34

我想我找到了一个解决方案：

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

这似乎使我的内存使用量降低到足以完成任务。

更新

我最近发现了一种在Python中处理数据流的更灵活的方法，使用线程。有趣的是，Python 的能力是如此之差，而 shell 脚本却可以轻松做到这一点！

I think I found a solution:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

This seems to get my memory usage down enough to get through the task.

Update

I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!

回复收藏 0 原文

七禾 2024-12-03 23:10:34

如果我需要读取这么大的东西的标准输出，我可能会做的就是在创建进程时将其发送到文件。

with open(my_large_output_path, 'w') as fo:
    with open(my_large_error_path, 'w') as fe:
        myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)

编辑：如果您需要流式传输，您可以尝试创建一个类似文件的对象并将其传递给 stdout 和 stderr。（不过，我还没有尝试过。）然后，您可以在写入对象时从该对象中读取（查询）。

What I would probably do instead, if I needed to read the stdout for something that large, is send it to a file on creation of the process.

with open(my_large_output_path, 'w') as fo:
    with open(my_large_error_path, 'w') as fe:
        myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)

Edit: If you need to stream, you could try making a file-like object and passing it to stdout and stderr. (I haven't tried this, though.) You could then read (query) from the object as it's being written.

回复收藏 0 原文

云之铃。 2024-12-03 23:10:34

对于那些在使用 Popen 时应用程序在一定时间后挂起的用户，请查看以下我的案例：

经验法则，如果您不打算使用 stderr和 stdout 流，然后不要在 Popen 的参数中传递/初始化它们！因为它们会填满并给你带来很多问题。

如果您在一定时间内需要它们并且需要保持进程运行，那么您可以随时关闭这些流。

try:
    p = Popen(COMMAND, stdout=PIPE, stderr=PIPE)
    # After using stdout and stderr
    p.stdout.close()
    p.stderr.close()
except Exception as e:
    pass

For those whose application hangs after a certain amount of time when using Popen, please look for my case below:

A Rule of thumb, if you're not going to use stderr and stdout streams then don't pass/init them in the parameters of Popen! because they will fill up and cause you a lot of problems.

If you need them for a certain amount of time and you need to keep the process running, then you can close those streams at any time.

try:
    p = Popen(COMMAND, stdout=PIPE, stderr=PIPE)
    # After using stdout and stderr
    p.stdout.close()
    p.stderr.close()
except Exception as e:
    pass

回复收藏 0 原文

~没有更多了~