多个 GZip 文件的快速串联

发布于 2024-12-13 18:28:44 字数 226 浏览 1 评论 0原文

我有 gzip 文件列表:

file1.gz
file2.gz
file3.gz

有没有一种方法可以将这些文件连接或 gzipping 成一个 gzip 文件 无需解压它们?

在实践中,我们将在网络数据库(CGI)中使用它。网络将在哪里接收 来自用户的查询并根据查询列出所有文件并呈现它们 在批处理文件中返回给用户。

I have list of gzip files:

file1.gz
file2.gz
file3.gz

Is there a way to concatenate or gzipping these files into one gzip file
without having to decompress them?

In practice we will use this in a web database (CGI). Where the web will receive
a query from user and list out all the files based on the query and present them
in a batch file back to the user.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

寂寞陪衬 2024-12-20 18:28:44

使用 gzip 文件,您可以简单地将文件连接在一起,如下所示:

cat file1.gz file2.gz file3.gz > allfiles.gz

Per the gzip RFC,

gzip 文件由一系列“成员”(压缩数据集)组成。 [...] 成员只是一个接一个地出现在文件中,在它们之前、之间或之后没有任何附加信息。

请注意,这与构建串联数据的单个 gzip 文件并不完全相同;除此之外,所有原始文件名都被保留。然而,gunzip 似乎将其视为相当于串联。

由于现有工具通常会忽略其他成员的文件名标头,因此不容易从结果中提取单个文件。如果您希望做到这一点,请构建一个 ZIP 文件。 ZIP和GZIP都使用DEFLATE算法进行实际压缩(ZIP支持一些其他压缩算法以及一个选项 - 方法8是与GZIP压缩相对应的算法);区别在于元数据格式。由于元数据是未压缩的,因此很容易剥离 gzip 标头并添加 ZIP 文件标头和中央目录记录。请参阅 gzip 格式规范ZIP 格式规范

With gzip files, you can simply concatenate the files together, like so:

cat file1.gz file2.gz file3.gz > allfiles.gz

Per the gzip RFC,

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.

Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.

横笛休吹塞上声 2024-12-20 18:28:44

以下是 man 1 gzip 关于您的要求的说明。

可以串联多个压缩文件。在这种情况下,gunzip 将立即提取所有成员。例如:

gzip -c file1 > foo.gz
gzip -c 文件2>> foo.gz

然后

gunzip -c foo

相当于

cat file1 file2

不用说,file1可以替换为file1.gz

你必须注意这一点:

gunzip 将立即提取所有成员

因此,要单独获取所有成员,如果您愿意,您将必须使用其他内容或编写内容。

但是,手册页中也解决了这个问题。

如果您希望创建包含多个成员的单个存档文件,以便以后可以独立提取成员,请使用 tar 或 zip 等存档程序。 GNU tar 支持 -z 选项来透明地调用 gzip。 gzip 被设计为 tar 的补充,而不是替代品。

Here is what man 1 gzip says about your requirement.

Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:

gzip -c file1  > foo.gz
gzip -c file2 >> foo.gz

Then

gunzip -c foo

is equivalent to

cat file1 file2

Needless to say, file1 can be replaced by file1.gz.

You must notice this:

gunzip will extract all members at once

So to get all members individually, you will have to use something additional or write, if you wish to do so.

However, this is also addressed in man page.

If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.

¢蛋碎的人ぎ生 2024-12-20 18:28:44

就用猫吧。它非常快(对我来说 500 MB 为 0.2 秒)

cat *gz > final
mv final final.gz

然后您可以使用 zcat 读取输出以确保它很漂亮:

zcat final.gz

我尝试了“gz -c”的另一个答案,但在使用已经 gzip 压缩的文件时,我最终得到了垃圾输入(我猜它双重压缩了它们)。

PV:

更好的是,如果你有的话,用“pv”代替 cat:

pv *gz > final
mv final final.gz

这会在工作时为你提供一个进度条,但与 cat 的功能相同。

Just use cat. It is very fast (0.2 seconds for 500 MB for me)

cat *gz > final
mv final final.gz

You can then read the output with zcat to make sure it's pretty:

zcat final.gz

I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).

PV:

Better yet, if you have it, 'pv' instead of cat:

pv *gz > final
mv final final.gz

This gives you a progress bar as it works, but does the same thing as cat.

左耳近心 2024-12-20 18:28:44

您可以创建这些文件的 tar 文件,然后对 tar 文件进行 gzip 以创建新的 gzip 文件

tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar

You can create a tar file of these files and then gzip the tar file to create the new gzip file

tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文