如何查找单个 zlib 文件中有多少个 zlib 文件?
我想知道如何确定单个文件中包含多少个 zlib 文件。
一个例子;认为我有 5 个不同的文件,并使用 zlib 分别压缩它们。然后我把它们结合起来。所以,我有一个文件包含 5 个不同的 zlib 文件。现在,我如何找到该单个文件中有多少个 zlib 文件?我只需要找出单个文件中 zlib 文件的数量。我想,我需要转储它的十六进制代码并 grep 一些幻数,但不知道如何做到这一点。
你能帮我一下吗?
I would like to know how to determine how many zlib files are contained in a single file.
An example; Think I have 5 different files, and compressed them separately by using zlib. Then I combined them. So, I have one file contains 5 different zlib files. Now, how can I find how many zlib files are in that single file? I just need to find out the number of zlib files in a single file. I guess, I need to dump its hex code and grep some magic number, but could not figure out how to do that.
Could you help me out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
块的长度不存储在 zlib 编码数据中( 除外)非压缩块)。相反,块的结束由流中的令牌 [256] 表示。但是这个令牌是霍夫曼编码的,并且霍夫曼编码通常是动态生成的所以每个块的值可能不同。此外,编码的令牌可能从字节的任何位开始,因此无法“grep”它。找到块标记末尾的唯一方法是解码整个块并检查何时击中此标记。
我认为您应该查看您的容器是否包含任何长度信息,并使用它来找出压缩数据的长度。
有关 zlib 格式的详细信息,请参阅 RFC 1950 以及相关的 DEFLATE 规范是 RFC 1951。
The length of a block is not stored in the zlib encoded data (with the exception of non-compressed block). Instead the end of a block is signified by a token [256] in the stream. But this token is Huffman encoded and the Huffman encoding is usually dynamically generated so it can be different for each block. Furthermore the encoded token might start on any bit of the byte so there is no way to "grep" it. The only way to find the end of block token is to decode the entire block and check to see when you hit this token.
I think instead you should see if your container includes any length information and use that to find out how long the compressed data is.
For details of the zlib format see RFC 1950, and the related DEFLATE specification which is RFC 1951.
如果您的单个文件是多个 gzip 文件的串联,那么您可以找到文件数量的上限。 Gzip 格式以神奇的
0x1f8b
开头。统计单个文件中魔法的出现次数。该计数表明您最多有那么多文件。不幸的是,这是一个上限,而不是确切的文件数量。因为
0x1f8b
也可能偶然出现在数据部分 64K 字节中的 1 个。要将错误匹配数减少到大约 2400 万字节中的 1 个,您可以改为扫描0x1f8b08
。尾随的0x08
是“压缩方法”字段,始终为 8。可以进一步细化此“过滤器”。参见RFC1952的FLG字段。
如果单个文件的成员不是 gzip 格式,而是 Zlib 或 raw 格式,那么你就不走运了;你必须解压缩来计算文件的数量——无论如何我都会这样做。
If your single file is a concatenation multiple gzip files, then you can find an upper bound on the number of files. Gzip format starts with the magic
0x1f8b
.Count the occurrence of the magic in the single file. The count indicates that you have at most that many files. Unfortunately, it's an upper bound and not an exact number of files. Because
0x1f8b
may also occur in the data section by chance 1 out of 64K bytes. To reduce false matches to 1 in ~24 million bytes you can scan for0x1f8b08
instead. The trailing0x08
is the "compression method" field which is always 8.Further refinements of this "filter" is possible. See the FLG field of RFC1952.
If the members of the single file are not gzip formatted, but the Zlib or raw formats, then you are out of luck; you must decompress to count the number of files - which I would do regardless.