维基百科对 ZIP 文件格式有很好的描述,但是“中心目录”结构让我感到困惑。具体来说是这样的:
此顺序允许一次创建 ZIP 文件,但通常通过首先读取最后的中心目录来解压缩。
问题在于,即使是中央目录的尾随标头也是可变长度的。那么,如何才能获得要解析的中央目录的开头呢?
(哦,在来这里询问之前,我确实花了一些时间徒劳地查看 APPNOTE.TXT :P)
Wikipedia has an excellent description of the ZIP file format, but the "central directory" structure is confusing to me. Specifically this:
This ordering allows a ZIP file to be created in one pass, but it is usually decompressed by first reading the central directory at the end.
The problem is that even the trailing header for the central directory is variable length. How then, can someone get the start of the central directory to parse?
(Oh, and I did spend some time looking at APPNOTE.TXT in vain before coming here and asking :P)
发布评论
评论(5)
我的哀悼,阅读维基百科的描述给我一个非常强烈的印象,你需要做大量的猜测+检查工作:
从末尾向后寻找 0x06054b50 目录结束标记,向前查找 16 个字节以找到偏移量目录起始标记 0x02014b50,希望就是这样。您可以进行一些健全性检查,例如在目录结束标记后查找注释长度和注释字符串标记,但感觉 Zip 解码器确实有效,因为人们不会将有趣的字符放入 zip 注释、文件名等中四。无论如何,完全基于维基百科页面。
My condolences, reading the wikipedia description gives me the very strong impression that you need to do a fair amount of guess + check work:
Hunt backwards from the end for the 0x06054b50 end-of-directory tag, look forward 16 bytes to find the offset for the start-of-directory tag 0x02014b50, and hope that is it. You could do some sanity checks like looking for the comment length and comment string tags after the end-of-directory tag, but it sure feels like Zip decoders work because people don't put funny characters into their zip comments, filenames, and so forth. Based entirely on the wikipedia page, anyhow.
我前段时间正在实现 zip 存档支持,并且我在最后几千字节中搜索中央目录签名的末尾(4 个字节)。这效果非常好,直到有人将 50kb 文本放入注释中(这不太可能发生。绝对可以肯定,您可以搜索最后 64kb + 几个字节,因为注释大小是 16 位)。
之后,我查找中央目录定位器的 zip64 端,这更容易,因为它具有固定的结构。
I was implementing zip archive support some time ago, and I search last few kilobytes for a end of central directory signature (4 bytes). That works pretty good, until somebody will put 50kb text into comment (which is unlikely to happen. To be absolutely sure, you can search last 64kb + few bytes, since comment size is 16 bit).
After that, I look up for zip64 end of central dir locator, that's easier since it has fixed structure.
这是我必须推出的解决方案,以防有人需要它。这涉及到获取中央目录。
就我而言,我不想要任何 zip 解决方案中提供的任何压缩功能。我只是想了解一下内容。以下代码将返回 ZipArchive,其中包含 zip 中每个条目的列表。
它还使用最少量的文件访问和内存分配。
TinyZip.cpp
TinyZip.h
用法:
Here is a solution I have just had to roll out incase anybody needs this. This involves grabbing the central directory.
In my case I did not want any of the compression features that are offered in any of the zip solutions. I just wanted to know about the contents. The following code will return a ZipArchive of a listing of every entry in the zip.
It also uses a minimum amount of file access and memory allocation.
TinyZip.cpp
TinyZip.h
Usage:
如果有人仍然在解决这个问题 - 请查看我在 GitHub 上托管的存储库,其中包含可以回答您的问题的项目。
Zip 文件阅读器
基本上,它的作用是下载位于文件末尾的
.zip
文件的中央目录
部分。然后它会从字节中读出每个文件和文件夹名称及其路径并将其打印到控制台。
我已经对源代码中更复杂的步骤做了评论。
该程序只能运行到大约 4GB 的 .zip 文件。之后,您将必须对虚拟机大小进行一些更改,甚至可能更多。
享受 :)
In case someone out there is still struggling with this problem - have a look at the repository I hosted on GitHub containing my project that could answer your questions.
Zip file reader
Basically what it does is download the
central directory
part of the.zip
file which resides in the end of the file.Then it will read out every file and folder name with it's path from the bytes and print it out to console.
I have made comments about the more complicated steps in my source code.
The program can work only till about 4GB .zip files. After that you will have to do some changes to the VM size and maybe more.
Enjoy :)
我最近遇到了一个类似的用例,并认为我会为后代分享我的解决方案,因为这篇文章帮助我走向了正确的方向。
使用维基百科此处上详细介绍的 Zip 文件中央目录偏移量,我们可以采用使用以下方法来解析中央目录并检索所包含文件的列表:
步骤:
0x06054b50
),从文件末尾开始(即如果使用ifstream
,则使用std::ios::ate
反向读取文件)0x02014b50
)或找到EOCDR,然后跟踪位置这里的关键点是 EOCDR 由签名 (
0x06054b50
) 唯一标识仅发生一次。使用16
字节偏移量,我们可以将自己定位到中央目录头(0x02014b50
)的第一次出现。每条记录都将具有相同的0x02014b50
标头签名,因此您只需循环出现标头签名,直到再次遇到 EOCDR 结束签名 (0x06054b50
)。摘要:
如果您想查看上述步骤的工作示例,您可以在 GitHub 上查看我的最小实现 (ZipReader) 此处。该实现可以像这样使用:
I recently encountered a similar use-case and figured I would share my solution for posterity since this post helped send me in the right direction.
Using the Zip file central directory offsets detailed on Wikipedia here, we can take the following approach to parse the central directory and retrieve a list of the contained files:
STEPS:
0x06054b50
), beginning at the end of the file (i.e. read the file in reverse usingstd::ios::ate
if using aifstream
)16
bytes from the EOCDR) to position the stream reader at the beginning of the central directory46
bytes from the CD start) to position the stream reader at the file name and track its position start point0x02014b50
) or the EOCDR is found, and track the positionThe key point here is that the EOCDR is uniquely identified by a signature (
0x06054b50
) that occurs only one time. Using the16
byte offset, we can position ourselves to the first occurrence of the central directory header (0x02014b50
). Each record will have the same0x02014b50
header signature, so you just need to loop through occurrences of the header signatures until you hit the EOCDR ending signature (0x06054b50
) again.SUMMARY:
If you want to see a working example of the above steps, you can check out my minimal implementation (ZipReader) on GitHub here. The implementation can be used like this: