当程序写入/读取文件时,如何透明地压缩/解压缩文件?

发布于 2024-07-17 08:09:51 字数 364 浏览 7 评论 0原文

我有一个程序可以读取和写入非常大的文本文件。 然而,由于这些文件的格式(它们是二进制数据的 ASCII 表示),这些文件实际上很容易被压缩。 例如,其中一些文件的大小超过 10GB,但 gzip 实现了 95% 的压缩。

我无法修改程序,但磁盘空间很宝贵,因此我需要设置一种方法,使其可以在透明压缩和解压缩这些文件的同时读取和写入这些文件。

该程序只能读取和写入文件,因此据我了解,我需要为输入和输出设置一个命名管道。 有些人建议使用压缩文件系统,这似乎也可行。 我怎样才能使这两者发挥作用?

技术信息:我使用的是现代 Linux。 该程序读取单独的输入和输出文件。 它按顺序读取输入文件,但读取两次。 它按顺序写入输出文件。

I have a program that reads and writes very large text files. However, because of the format of these files (they are ASCII representations of what should have been binary data), these files are actually very easily compressed. For example, some of these files are over 10GB in size, but gzip achieves 95% compression.

I can't modify the program but disk space is precious, so I need to set up a way that it can read and write these files while they're being transparently compressed and decompressed.

The program can only read and write files, so as far as I understand, I need to set up a named pipe for both input and output. Some people are suggesting a compressed filesystem instead, which seems like it would work, too. How do I make either work?

Technical information: I'm on a modern Linux. The program reads a separate input and output file. It reads through the input file in order, though twice. It writes the output file in order.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

爱你是孤单的心事 2024-07-24 08:09:51

查看 zlibc:http://zlibc.linux.lu/

另外,如果可以选择 FUSE(即内核不太旧),请考虑:compFUSEd http://www.biggerbytes。是/

Check out zlibc: http://zlibc.linux.lu/.

Also, if FUSE is an option (i.e. the kernel is not too old), consider: compFUSEd http://www.biggerbytes.be/

ぇ气 2024-07-24 08:09:51

命名管道不会为您提供全双工操作,因此如果您只需要提供一个文件名,则会有点复杂。

您知道您的应用程序是否需要查找文件吗?

您的应用程序可以使用 stdin、stdout 吗?

也许一个解决方案是创建一个迷你压缩文件系统,其中仅包含一个包含文件的目录,

因为您有单独的输入和输出文件,您可以执行以下操作:

mkfifo readfifo
mkfifo writefifo
zcat your inputfile > readfifo &
gzip writefifo > youroutputfile &

launch your program !

现在,您可能会遇到按输入顺序读取两次的麻烦,因为一旦zcat完成读取输入文件,你的程序就会得到一个SIGPIPE信号。

正确的解决方案可能是使用像CompFUSE这样的压缩文件系统,因为这样你就不必担心像seek这样不支持的操作。

named pipes won't give you full duplex operations, so it will be a little bit more complicated if you need to provide just one filename.

Do you know if your applications needs to seek through the file ?

Does your application work with stdin, stdout ?

Maybe a solution is to create a mini compressed file system that contains only a directory with your files

Since you have separate input and output file you can do the following :

mkfifo readfifo
mkfifo writefifo
zcat your inputfile > readfifo &
gzip writefifo > youroutputfile &

launch your program !

Now, you probably will get in trouble with the read twice in order of the input, because as soon as zcat is finished reading the input file, yout program will get a SIGPIPE signal

The proper solution is probably to use a compressed file system like CompFUSE, because then you don't have to worry about unsupported operations like seek.

寒江雪… 2024-07-24 08:09:51

btrfs:

https://btrfs.wiki.kernel.org/index.php/Main_Page

如今提供了对相当快的“自动透明压缩/解压缩”的支持,并且存在于较新的内核中(尽管标记为实验性的)。

btrfs:

https://btrfs.wiki.kernel.org/index.php/Main_Page

provides support for pretty fast "automatic transparent compression/decompression" these days, and is present (though marked experimental) in newer kernels.

倒带 2024-07-24 08:09:51

您使用哪种语言?

如果您使用的是 Java,请查看 API 文档中的 GZipInputStream 和 GZipOutputStream 类。

如果您使用 C/C++,zlibc 可能是最好的方法。

Which language are you using?

If you are using Java, take a look at GZipInputStream and GZipOutputStream classes in the API doc.

If you are using C/C++, zlibc is probably the best way to go about it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文