希望您听说过巧妙的黑客攻击 允许您将 JPG 和 Zip 文件合并为一个文件,并且它对于这两种格式都是有效(或至少可读)的文件。好吧,我意识到,由于 JPG 在末尾允许任意内容,而 ZIP 在开头,因此您可以在中间再添加一种格式。出于此问题的目的,假设中间数据是任意二进制数据,保证不与 JPG 或 ZIP 格式冲突(意味着它不包含神奇的 zip 标头 0x04034b50)。插图:
0xFFD8 <- start jpg data end -> 0xFFD9 ... ARBITRARY BINARY DATA ... 0x04034b50 <- start zip file ... EOF
我是这样养猫的:
cat "mss_1600.jpg" 文件a 文件b
文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b 文件a
fileb“null.bytes”
“随机压缩文件.zip”>临时文件.zip
这会生成一个 6,318 KB 文件。它无法在 7-Zip 中打开。然而,当我少了一个“双”时(所以不是 13 个 filea 和 b,而是 12 个):
cat "mss_1600.jpg" 文件a 文件b
文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b 文件a
文件b 文件a 文件b 文件a 文件b
“null.bytes”“randomzipfile.zip”>
临时文件.zip
它会生成一个 5,996 KB 的文件,确实可以在 7-Zip 中打开。
所以我知道我的任意二进制数据没有神奇的 Zip 文件头来搞砸它。我有 工作 jpg+data+zip 和 non-working jpg+data+zip (另存为,因为浏览器认为它们是图像,并添加zip 扩展名)。
我想知道为什么 13 个组合会失败,而 12 个组合却不会。为了获得奖励积分,我需要以某种方式解决这个问题。
Hopefully you've heard of the neat hack that lets you combine a JPG and a Zip file into a single file and it's a valid (or at least readable) file for both formats. Well, I realized that since JPG lets arbitrary stuff at the end, and ZIP at the beginning, you could stick one more format in there - in the middle. For the purposes of this question, assume the middle data is arbitrary binary data guarenteed not to conflict with the JPG or ZIP formats (meaning it doesn't contain the magic zip header 0x04034b50). Illustration:
0xFFD8 <- start jpg data end -> 0xFFD9 ... ARBITRARY BINARY DATA ... 0x04034b50 <- start zip file ... EOF
I am catting like this:
cat "mss_1600.jpg" filea fileb
filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb "null.bytes"
"randomzipfile.zip" > temp.zip
This produces a 6,318 KB file. It does not open in 7-Zip. However, when I cat one less 'double' (so instead of 13 filea and b's, 12):
cat "mss_1600.jpg" filea fileb
filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb
"null.bytes" "randomzipfile.zip" >
temp.zip
It produces a 5,996 KB file that does open in 7-Zip.
So I know my arbitrary binary data doesn't have the magic Zip File Header to screw it up. I have reference files of the working jpg+data+zip and the non-working jpg+data+zip (save-as cause the browser thinks they're images, and add the zip extensions yourself).
I want to know why it fails with 13 combinations and doesn't with 12. For bonus points, I need to get around this somehow.
发布评论
评论(4)
我下载了 7-Zip 的源代码并找出了导致这种情况发生的原因。
在 CPP/7zip/UI/Common/OpenArchive.cpp 中,您将看到以下内容:
这意味着只会搜索文件的前 4194304 字节来查找标头。如果在那里找不到该文件,7-Zip 会认为它是无效文件。
您可以通过更改
1 << 来将该限制加倍。 22
到1 << 23
。我通过重建 7-Zip 测试了这一更改,它有效。编辑:要解决此问题,您可以下载源,进行上述更改,然后构建它。我使用 VS 2008 构建它。打开 VS 命令提示符,导航到 extracted-source-location\CPP\7zip\Bundles 并输入“nmake”。然后在 Alone 目录中运行“7za t nonworking.jpg”,您应该看到“一切正常”。
I downloaded the source for 7-Zip and figured out what is causing this to happen.
In CPP/7zip/UI/Common/OpenArchive.cpp, you'll see the following:
That means that only the first 4194304 bytes of the file will be searched for the header. If it isn't found there, 7-Zip considers it an invalid file.
You can double that limit by changing
1 << 22
to1 << 23
. I tested that change by rebuilding 7-Zip and it works.EDIT: To get around this issue, you can download the source, make the above change, and build it. I built it using VS 2008. Open the VS command prompt, navigate to extracted-source-location\CPP\7zip\Bundles and type 'nmake'. Then in the Alone directory run '7za t nonworking.jpg' and you should see 'Everything is Ok'.
实际上,这确实是一个由两部分组成的答案:)
首先,无论人们怎么说,从技术上讲,zip 文件都不能逐字放在文件末尾。中央目录记录的末尾有一个值,该值指示距当前磁盘开头的字节偏移量(如果只有一个 .zip 文件,则表示当前文件)。现在很多处理器都会忽略这一点,尽管 Windows 的 zip 文件夹不会,因此您需要更正该值以使其在 Windows 资源管理器中工作(不是您可能关心的;P)请参阅 Zip APPNOTE 了解有关文件格式的信息。基本上,您可以在十六进制编辑器(或编写工具)中找到“中央目录开头相对于起始磁盘号的偏移量”值。然后找到第一个“中央文件头签名”(十六进制的 504b0102)并将值设置为该偏移量。
现在,唉,这并不能修复 7zip,但这是由于 7zip 尝试猜测文件格式的方式造成的。基本上,它只会在第一个 ~4MiB 中搜索二进制序列 504b0304,如果没有找到它,它会假设它不是 Zip 并尝试其其他存档格式。显然,这就是为什么再添加一个文件会破坏事情,它会超出搜索限制。
现在要修复它,您需要做的是将十六进制字符串添加到 jpeg 中而不破坏它。一种方法是在 FFD8 JPEG SOI 标头后面添加以下十六进制数据 FFEF0005504B030400 。这会添加一个带有您的序列的自定义块,并且是正确的,因此 jpeg 标头应该忽略它。
Actually it is a two part answer really :)
Firstly no matter what people say zip files cannot technically be put verbatim at the end of files. The end of central directory record has a value which indicates the byte offset from the start of the current disk (if you have only one .zip file, that means the current file). Now alot of processors ignore this, although Windows' zip folder doesn't so you need to correct that value to make it work in Windows explorer (not that you might care ;P) See Zip APPNOTE for info on the file format. Basically you find in a hex editor (or write a tool) to find the "offset of start of central directory with respect to the starting disk number" value. Then find the first "central file header signature" (hex of 504b0102) and set the value to that offset.
Now alas that doesn't fix 7zip but that is due to the way 7zip tries to guess the file format. Basically it will only search the first ~4MiB for the binary sequence 504b0304, if it doesn't find it it assumes it isn't Zip and tries its other archive formats. This is obviously why adding one more file breaks things, it pushes it over the limit for the search.
Now to fix it what you need to do is add that hex string to the jpeg without breaking it. One way of doing this is to add just after the FFD8 JPEG SOI header the following hex data, FFEF0005504B030400 . That adds a custom block with your sequence and is correct so jpeg headers should just ignore it.
因此,对于发现这个问题的其他人,故事如下:
是的,安迪对于 7-Zip 在文件上失败的原因确实是正确的,但这对我的问题没有帮助,因为我无法完全让人们使用我的版本7-Zip。
然而tyranid给了我解决方案。
So for anyone else finding this question, here's the story:
Yes, Andy is literally correct as to why 7-Zip is failing on the file, but it doesn't help my problem since I can't exactly get people to use MY version of 7-Zip.
tyranid however got me the solution.
您可以使用 DotNetZip 生成混合 JPG+ZIP 文件。 DotNetZip 可以保存到流中,并且它足够智能,可以在开始将 zip 内容写入其中之前识别预先存在的流的原始偏移量。因此,在伪代码中,您可以通过以下方式获得 JPG+ZIP:
所有偏移量均已正确计算。使用相同的技术来生成自解压存档。您可以在 EXE 上打开流,然后查找到末尾,并将 ZIP 内容写入该流。如果您这样做,所有偏移量都会正确计算。
另一件事 - 关于另一篇文章中的一条评论...ZIP 可以在文件的开头和 包含任意数据。据我所知,没有要求 zip 中央目录必须位于文件末尾,尽管这是典型的情况。
You can produce hybrid JPG+ZIP files using DotNetZip. DotNetZip can save to a stream, and it is intelligent enough to recognize the original offset of a pre-existing stream before it begins writing zip content into it. Therefore in pseudo code, you can get a JPG+ZIP this way:
All the offsets are correctly figured. The same technique is used to produce a self-extracting archive. You can open the stream on the EXE, then seek to the end, and write the ZIP content to that stream. All the offsets are correctly calculated if you do it this way.
Another thing - regarding one of the comments in another post... ZIP can have arbitrary data in the beginning and at the end of the file. There's no requirement as far as I know that the zip central directory needs to be at the end of the file, though that is typical.