Python Popen.communicate() 内存限制的替代方案?
我有以下 Python 代码块(运行 v2.7),当我处理大型(几个 GB)文件时,会导致抛出 MemoryError
异常:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
sys.stderr.write(myStderr)
在读取 Popen.communicate()
的文档,似乎正在进行一些缓冲:
注意读取的数据会缓存在内存中,因此如果数据量较大或无限制,请勿使用此方法。
有没有办法禁用此缓冲,或者在进程运行时强制定期清除缓存?
我应该在 Python 中使用什么替代方法来运行将千兆字节数据流式传输到 stdout 的命令?
我应该注意,我需要处理输出和错误流。
I have the following chunk of Python code (running v2.7) that results in MemoryError
exceptions being thrown when I work with large (several GB) files:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
sys.stderr.write(myStderr)
In reading the documentation to Popen.communicate()
, there appears to be some buffering going on:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?
What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout
?
I should note that I need to handle output and error streams.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想我找到了一个解决方案:
这似乎使我的内存使用量降低到足以完成任务。
更新
我最近发现了一种在Python中处理数据流的更灵活的方法,使用线程。有趣的是,Python 的能力是如此之差,而 shell 脚本却可以轻松做到这一点!
I think I found a solution:
This seems to get my memory usage down enough to get through the task.
Update
I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!
如果我需要读取这么大的东西的标准输出,我可能会做的就是在创建进程时将其发送到文件。
编辑:如果您需要流式传输,您可以尝试创建一个类似文件的对象并将其传递给 stdout 和 stderr。 (不过,我还没有尝试过。)然后,您可以在写入对象时从该对象中读取(查询)。
What I would probably do instead, if I needed to read the stdout for something that large, is send it to a file on creation of the process.
Edit: If you need to stream, you could try making a file-like object and passing it to stdout and stderr. (I haven't tried this, though.) You could then read (query) from the object as it's being written.
对于那些在使用 Popen 时应用程序在一定时间后挂起的用户,请查看以下我的案例:
经验法则,如果您不打算使用 stderr和 stdout 流,然后不要在 Popen 的参数中传递/初始化它们!因为它们会填满并给你带来很多问题。
如果您在一定时间内需要它们并且需要保持进程运行,那么您可以随时关闭这些流。
For those whose application hangs after a certain amount of time when using Popen, please look for my case below:
A Rule of thumb, if you're not going to use stderr and stdout streams then don't pass/init them in the parameters of Popen! because they will fill up and cause you a lot of problems.
If you need them for a certain amount of time and you need to keep the process running, then you can close those streams at any time.