多个 GZip 文件的快速串联
我有 gzip 文件列表:
file1.gz
file2.gz
file3.gz
有没有一种方法可以将这些文件连接或 gzipping 成一个 gzip 文件 无需解压它们?
在实践中,我们将在网络数据库(CGI)中使用它。网络将在哪里接收 来自用户的查询并根据查询列出所有文件并呈现它们 在批处理文件中返回给用户。
I have list of gzip files:
file1.gz
file2.gz
file3.gz
Is there a way to concatenate or gzipping these files into one gzip file
without having to decompress them?
In practice we will use this in a web database (CGI). Where the web will receive
a query from user and list out all the files based on the query and present them
in a batch file back to the user.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用 gzip 文件,您可以简单地将文件连接在一起,如下所示:
Per the gzip RFC,
请注意,这与构建串联数据的单个 gzip 文件并不完全相同;除此之外,所有原始文件名都被保留。然而,gunzip 似乎将其视为相当于串联。
由于现有工具通常会忽略其他成员的文件名标头,因此不容易从结果中提取单个文件。如果您希望做到这一点,请构建一个 ZIP 文件。 ZIP和GZIP都使用DEFLATE算法进行实际压缩(ZIP支持一些其他压缩算法以及一个选项 - 方法8是与GZIP压缩相对应的算法);区别在于元数据格式。由于元数据是未压缩的,因此很容易剥离 gzip 标头并添加 ZIP 文件标头和中央目录记录。请参阅 gzip 格式规范 和 ZIP 格式规范。
With gzip files, you can simply concatenate the files together, like so:
Per the gzip RFC,
Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.
Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.
以下是
man 1 gzip
关于您的要求的说明。不用说,
file1
可以替换为file1.gz
。你必须注意这一点:
因此,要单独获取所有成员,如果您愿意,您将必须使用其他内容或编写内容。
但是,手册页中也解决了这个问题。
Here is what
man 1 gzip
says about your requirement.Needless to say,
file1
can be replaced byfile1.gz
.You must notice this:
So to get all members individually, you will have to use something additional or write, if you wish to do so.
However, this is also addressed in man page.
就用猫吧。它非常快(对我来说 500 MB 为 0.2 秒)
然后您可以使用 zcat 读取输出以确保它很漂亮:
我尝试了“gz -c”的另一个答案,但在使用已经 gzip 压缩的文件时,我最终得到了垃圾输入(我猜它双重压缩了它们)。
PV:
更好的是,如果你有的话,用“pv”代替 cat:
这会在工作时为你提供一个进度条,但与 cat 的功能相同。
Just use cat. It is very fast (0.2 seconds for 500 MB for me)
You can then read the output with zcat to make sure it's pretty:
I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).
PV:
Better yet, if you have it, 'pv' instead of cat:
This gives you a progress bar as it works, but does the same thing as cat.
您可以创建这些文件的 tar 文件,然后对 tar 文件进行 gzip 以创建新的 gzip 文件
You can create a tar file of these files and then gzip the tar file to create the new gzip file