当程序写入/读取文件时,如何透明地压缩/解压缩文件?
我有一个程序可以读取和写入非常大的文本文件。 然而,由于这些文件的格式(它们是二进制数据的 ASCII 表示),这些文件实际上很容易被压缩。 例如,其中一些文件的大小超过 10GB,但 gzip 实现了 95% 的压缩。
我无法修改程序,但磁盘空间很宝贵,因此我需要设置一种方法,使其可以在透明压缩和解压缩这些文件的同时读取和写入这些文件。
该程序只能读取和写入文件,因此据我了解,我需要为输入和输出设置一个命名管道。 有些人建议使用压缩文件系统,这似乎也可行。 我怎样才能使这两者发挥作用?
技术信息:我使用的是现代 Linux。 该程序读取单独的输入和输出文件。 它按顺序读取输入文件,但读取两次。 它按顺序写入输出文件。
I have a program that reads and writes very large text files. However, because of the format of these files (they are ASCII representations of what should have been binary data), these files are actually very easily compressed. For example, some of these files are over 10GB in size, but gzip achieves 95% compression.
I can't modify the program but disk space is precious, so I need to set up a way that it can read and write these files while they're being transparently compressed and decompressed.
The program can only read and write files, so as far as I understand, I need to set up a named pipe for both input and output. Some people are suggesting a compressed filesystem instead, which seems like it would work, too. How do I make either work?
Technical information: I'm on a modern Linux. The program reads a separate input and output file. It reads through the input file in order, though twice. It writes the output file in order.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
查看 zlibc:http://zlibc.linux.lu/。
另外,如果可以选择 FUSE(即内核不太旧),请考虑:compFUSEd http://www.biggerbytes。是/
Check out zlibc: http://zlibc.linux.lu/.
Also, if FUSE is an option (i.e. the kernel is not too old), consider: compFUSEd http://www.biggerbytes.be/
命名管道不会为您提供全双工操作,因此如果您只需要提供一个文件名,则会有点复杂。
您知道您的应用程序是否需要查找文件吗?
您的应用程序可以使用 stdin、stdout 吗?
也许一个解决方案是创建一个迷你压缩文件系统,其中仅包含一个包含文件的目录,
因为您有单独的输入和输出文件,您可以执行以下操作:
现在,您可能会遇到按输入顺序读取两次的麻烦,因为一旦zcat完成读取输入文件,你的程序就会得到一个SIGPIPE信号。
正确的解决方案可能是使用像CompFUSE这样的压缩文件系统,因为这样你就不必担心像seek这样不支持的操作。
named pipes won't give you full duplex operations, so it will be a little bit more complicated if you need to provide just one filename.
Do you know if your applications needs to seek through the file ?
Does your application work with stdin, stdout ?
Maybe a solution is to create a mini compressed file system that contains only a directory with your files
Since you have separate input and output file you can do the following :
Now, you probably will get in trouble with the read twice in order of the input, because as soon as zcat is finished reading the input file, yout program will get a SIGPIPE signal
The proper solution is probably to use a compressed file system like CompFUSE, because then you don't have to worry about unsupported operations like seek.
btrfs:
https://btrfs.wiki.kernel.org/index.php/Main_Page
如今提供了对相当快的“自动透明压缩/解压缩”的支持,并且存在于较新的内核中(尽管标记为实验性的)。
btrfs:
https://btrfs.wiki.kernel.org/index.php/Main_Page
provides support for pretty fast "automatic transparent compression/decompression" these days, and is present (though marked experimental) in newer kernels.
保险丝选项:
http://apps.sourceforge.net/mediawiki/fuse/index.html php?title=压缩文件系统
FUSE options:
http://apps.sourceforge.net/mediawiki/fuse/index.php?title=CompressedFileSystems
您使用哪种语言?
如果您使用的是 Java,请查看 API 文档中的 GZipInputStream 和 GZipOutputStream 类。
如果您使用 C/C++,zlibc 可能是最好的方法。
Which language are you using?
If you are using Java, take a look at GZipInputStream and GZipOutputStream classes in the API doc.
If you are using C/C++, zlibc is probably the best way to go about it.