如何在 C++ 中逐行读取 .gz 文件?
我有 3 TB 的 .gz 文件,想要在 C++ 程序中逐行读取其未压缩的内容。由于文件相当大,我想避免将其完全加载到内存中。
任何人都可以发布一个简单的例子吗?
I have 3 terabyte .gz file and want to read its uncompressed content line-by-line in a C++ program. As the file is quite huge, I want to avoid loading it completely in memory.
Can anyone post a simple example of doing it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您很可能必须使用 ZLib 的 deflate,示例可以从他们的 网站 获取
。查看 BOOST C++ 包装器
示例来自 BOOST 页(从文件中解压缩数据并将其写入标准输出)
You most probably will have to use ZLib's deflate, example is available from their site
Alternatively you may have a look at BOOST C++ wrapper
The example from BOOST page (decompresses data from a file and writes it to standard output)
对于要经常使用的东西,您可能想要使用之前的建议之一。或者,您可以执行
并让
yourprogram
从 cin 读取。这将根据需要解压缩内存中的部分文件,并将未压缩的输出发送到yourprogram
。For something that is going to be used regularly, you probably want to use one of the previous suggestions. Alternatively, you can do
and have
yourprogram
read from cin. This will decompress parts of the file in memory as it is needed, and send the uncompressed output toyourprogram
.使用zlib,我正在按照以下方式做一些事情:
编辑:删除了两个错误复制的内容上例中的
*
。编辑:更正了 v[pos + read - 2] 上的越界读取
Using zlib, I'm doing something along these lines:
EDIT: Removed two mis-copied
*
in the example above.EDIT: Corrected out of bounds read on v[pos + read - 2]
zlib 库支持以块的方式解压内存中的文件,因此您不必解压整个文件命令来处理它。
The zlib library supports decompressing files in memory in blocks, so you don't have to decompress the entire file in order to process it.
下面是一些代码,您可以使用它们逐行读取普通文件和压缩文件:
Here is some code with which you can read normal and zipped files line by line:
你不能这样做,因为 *.gz 没有“行”。
如果压缩数据有换行符,则必须将其解压缩。您不必一次解压缩所有数据,您知道,您可以分块进行,并在遇到换行符时将字符串发送回主程序。 *.gz 可以使用 zlib 解压。
You can't do that, because *.gz doesn't have "lines".
If compressed data has newlines, you'll have to decompress it. You don't have to decompress all data at once, you know, you can do it in chunks, and send strings back to main program when you encounter newline characters. *.gz can be decompressed using zlib.
Chilkat (http://www.chilkatsoft.com/) 具有从 C++ 读取压缩文件的库, .Net、VB...应用程序。
Chilkat (http://www.chilkatsoft.com/) has libraries to read compressed files from a C++, .Net, VB, ... application.