当前位置：文江博客话题详情

JPG+Zip 格式的文件组合问题

发布于 2024-08-13 03:55:20 字数 1389 浏览 23 评论 0 原文

希望您听说过巧妙的黑客攻击允许您将 JPG 和 Zip 文件合并为一个文件，并且它对于这两种格式都是有效（或至少可读）的文件。好吧，我意识到，由于 JPG 在末尾允许任意内容，而 ZIP 在开头，因此您可以在中间再添加一种格式。出于此问题的目的，假设中间数据是任意二进制数据，保证不与 JPG 或 ZIP 格式冲突（意味着它不包含神奇的 zip 标头 0x04034b50）。插图：

0xFFD8 <- start jpg data end -> 0xFFD9 ... ARBITRARY BINARY DATA ... 0x04034b50 <- start zip file ... EOF

我是这样养猫的：

cat "mss_1600.jpg" 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a fileb“null.bytes” “随机压缩文件.zip”>临时文件.zip

这会生成一个 6,318 KB 文件。它无法在 7-Zip 中打开。然而，当我少了一个“双”时（所以不是 13 个 filea 和 b，而是 12 个）：

cat "mss_1600.jpg" 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b 文件a 文件b “null.bytes”“randomzipfile.zip”> 临时文件.zip

它会生成一个 5,996 KB 的文件，确实可以在 7-Zip 中打开。

所以我知道我的任意二进制数据没有神奇的 Zip 文件头来搞砸它。我有工作 jpg+data+zip 和 non-working jpg+data+zip （另存为，因为浏览器认为它们是图像，并添加zip 扩展名）。

我想知道为什么 13 个组合会失败，而 12 个组合却不会。为了获得奖励积分，我需要以某种方式解决这个问题。

原文

Hopefully you've heard of the neat hack that lets you combine a JPG and a Zip file into a single file and it's a valid (or at least readable) file for both formats. Well, I realized that since JPG lets arbitrary stuff at the end, and ZIP at the beginning, you could stick one more format in there - in the middle. For the purposes of this question, assume the middle data is arbitrary binary data guarenteed not to conflict with the JPG or ZIP formats (meaning it doesn't contain the magic zip header 0x04034b50). Illustration:

0xFFD8 <- start jpg data end -> 0xFFD9 ... ARBITRARY BINARY DATA ... 0x04034b50 <- start zip file ... EOF

I am catting like this:

cat "mss_1600.jpg" filea fileb
filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb "null.bytes"
"randomzipfile.zip" > temp.zip

This produces a 6,318 KB file. It does not open in 7-Zip. However, when I cat one less 'double' (so instead of 13 filea and b's, 12):

cat "mss_1600.jpg" filea fileb
filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb filea
fileb filea fileb filea fileb
"null.bytes" "randomzipfile.zip" >
temp.zip

It produces a 5,996 KB file that does open in 7-Zip.

So I know my arbitrary binary data doesn't have the magic Zip File Header to screw it up. I have reference files of the working jpg+data+zip and the non-working jpg+data+zip (save-as cause the browser thinks they're images, and add the zip extensions yourself).

I want to know why it fails with 13 combinations and doesn't with 12. For bonus points, I need to get around this somehow.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贵在坚持 2024-08-20 03:55:20

我下载了 7-Zip 的源代码并找出了导致这种情况发生的原因。

在 CPP/7zip/UI/Common/OpenArchive.cpp 中，您将看到以下内容：

// Static-SFX (for Linux) can be big.
const UInt64 kMaxCheckStartPosition = 1 << 22;

这意味着只会搜索文件的前 4194304 字节来查找标头。如果在那里找不到该文件，7-Zip 会认为它是无效文件。

您可以通过更改 1 << 来将该限制加倍。 22 到 1 << 23 。我通过重建 7-Zip 测试了这一更改，它有效。

编辑：要解决此问题，您可以下载源，进行上述更改，然后构建它。我使用 VS 2008 构建它。打开 VS 命令提示符，导航到 extracted-source-location\CPP\7zip\Bundles 并输入“nmake”。然后在 Alone 目录中运行“7za t nonworking.jpg”，您应该看到“一切正常”。

I downloaded the source for 7-Zip and figured out what is causing this to happen.

In CPP/7zip/UI/Common/OpenArchive.cpp, you'll see the following:

// Static-SFX (for Linux) can be big.
const UInt64 kMaxCheckStartPosition = 1 << 22;

That means that only the first 4194304 bytes of the file will be searched for the header. If it isn't found there, 7-Zip considers it an invalid file.

You can double that limit by changing 1 << 22 to 1 << 23. I tested that change by rebuilding 7-Zip and it works.

EDIT: To get around this issue, you can download the source, make the above change, and build it. I built it using VS 2008. Open the VS command prompt, navigate to extracted-source-location\CPP\7zip\Bundles and type 'nmake'. Then in the Alone directory run '7za t nonworking.jpg' and you should see 'Everything is Ok'.

回复收藏 0 原文

半仙 2024-08-20 03:55:20

实际上，这确实是一个由两部分组成的答案:)

首先，无论人们怎么说，从技术上讲，zip 文件都不能逐字放在文件末尾。中央目录记录的末尾有一个值，该值指示距当前磁盘开头的字节偏移量（如果只有一个 .zip 文件，则表示当前文件）。现在很多处理器都会忽略这一点，尽管 Windows 的 zip 文件夹不会，因此您需要更正该值以使其在 Windows 资源管理器中工作（不是您可能关心的；P）请参阅 Zip APPNOTE 了解有关文件格式的信息。基本上，您可以在十六进制编辑器（或编写工具）中找到“中央目录开头相对于起始磁盘号的偏移量”值。然后找到第一个“中央文件头签名”（十六进制的 504b0102）并将值设置为该偏移量。

现在，唉，这并不能修复 7zip，但这是由于 7zip 尝试猜测文件格式的方式造成的。基本上，它只会在第一个 ~4MiB 中搜索二进制序列 504b0304，如果没有找到它，它会假设它不是 Zip 并尝试其其他存档格式。显然，这就是为什么再添加一个文件会破坏事情，它会超出搜索限制。

现在要修复它，您需要做的是将十六进制字符串添加到 jpeg 中而不破坏它。一种方法是在 FFD8 JPEG SOI 标头后面添加以下十六进制数据 FFEF0005504B030400 。这会添加一个带有您的序列的自定义块，并且是正确的，因此 jpeg 标头应该忽略它。

回复收藏 0 原文

欢烬 2024-08-20 03:55:20

因此，对于发现这个问题的其他人，故事如下：

是的，安迪对于 7-Zip 在文件上失败的原因确实是正确的，但这对我的问题没有帮助，因为我无法完全让人们使用我的版本7-Zip。

然而tyranid给了我解决方案。

首先，按照他的建议向 JPG 添加一个小字节串将让 7-Zip 打开它。然而，它与有效的 JPG 片段略有不同，它需要是 FFEF00 07 504B030400 - 长度偏离了 2 个字节。
这允许 7-Zip 打开它，但不能解压文件，它会默默地失败。这是因为中央目录中的条目具有指向文件条目的内部指针/偏移量。由于您在此之前放置了一堆内容，因此您需要更正所有这些指针！
要使用 Windows 内置的 zip 支持打开 zip，您需要像 tyranid 所说的那样，更正“中央目录开头相对于起始磁盘编号的偏移量”。这是一个用于执行最后两个操作的 python 脚本，尽管它是一个片段，而不是 copypasta-ready-to-use


#Now we need to read the file and rewrite all the zip headers.  Fun!
torewrite = open(magicfilename, 'rb')
magicdata = torewrite.read()
torewrite.close()

#Change the Central Repository's Offset
offsetOfCentralRepro = magicdata.find('\x50\x4B\x01\x02') #this is the beginning of the central repo
start = len(magicdata) - 6 #it so happens, that on my files, the point is stored 2 bytes from the end.  so datadatadatdaata OF FS ET !! 00 00 EOF where OFFSET!! is the 4 bytes 00 00 are the last two bytes, then EOF
magicdata = magicdata[:start] + pack('I', offsetOfCentralRepro) + magicdata[start+4:]

#Now change the individual offsets in the central directory files
startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', 0) #find the first central directory entry
startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', 10) #find the first file entry (we start at 10 because we have to skip past the first fake entry in the jpg)
while startOfCentralDirectoryEntry > 0:
    #Now I move a magic number of bytes past the entry (really! It's 42!)
    startOfCentralDirectoryEntry = startOfCentralDirectoryEntry + 42

    #get the current offset just to output something to the terminal
    (oldoffset,) = unpack('I', magicdata[startOfCentralDirectoryEntry : startOfCentralDirectoryEntry+4])
    print "Old Offset: ", oldoffset, " New Offset: ", startOfFileDirectoryEntry , " at ", startOfCentralDirectoryEntry
    #now replace it
    magicdata = magicdata[:startOfCentralDirectoryEntry] + pack('I', startOfFileDirectoryEntry) + magicdata[startOfCentralDirectoryEntry+4:]

    #now I move to the next central directory entry, and the next file entry
    startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', startOfCentralDirectoryEntry)
    startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', startOfFileDirectoryEntry+1)

#Finally write the rewritten headers' data
towrite = open(magicfilename, 'wb')
towrite.write(magicdata)
towrite.close()

So for anyone else finding this question, here's the story:

Yes, Andy is literally correct as to why 7-Zip is failing on the file, but it doesn't help my problem since I can't exactly get people to use MY version of 7-Zip.

tyranid however got me the solution.

First off, adding a small bytestring to the JPG as he suggests will let 7-Zip open it. However, it's slightly off from a valid JPG fragment, it needs to be FFEF00 07 504B030400 - the length was off by 2 bytes.
This lets 7-Zip open it, but not extract files, it fails silently. This is because the entries in the central directory have internal pointers/offsets that point to the entry of the file. Since you put a bunch of stuff before that, you need to correct all those pointers!
To have the zip open with Windows built in zip support, you need to, as tyranid says, correct the "offset of start of central directory with respect to the starting disk number". Here is a python script to do the last two, although it's a fragment, not copypasta-ready-to-use


#Now we need to read the file and rewrite all the zip headers.  Fun!
torewrite = open(magicfilename, 'rb')
magicdata = torewrite.read()
torewrite.close()

#Change the Central Repository's Offset
offsetOfCentralRepro = magicdata.find('\x50\x4B\x01\x02') #this is the beginning of the central repo
start = len(magicdata) - 6 #it so happens, that on my files, the point is stored 2 bytes from the end.  so datadatadatdaata OF FS ET !! 00 00 EOF where OFFSET!! is the 4 bytes 00 00 are the last two bytes, then EOF
magicdata = magicdata[:start] + pack('I', offsetOfCentralRepro) + magicdata[start+4:]

#Now change the individual offsets in the central directory files
startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', 0) #find the first central directory entry
startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', 10) #find the first file entry (we start at 10 because we have to skip past the first fake entry in the jpg)
while startOfCentralDirectoryEntry > 0:
    #Now I move a magic number of bytes past the entry (really! It's 42!)
    startOfCentralDirectoryEntry = startOfCentralDirectoryEntry + 42

    #get the current offset just to output something to the terminal
    (oldoffset,) = unpack('I', magicdata[startOfCentralDirectoryEntry : startOfCentralDirectoryEntry+4])
    print "Old Offset: ", oldoffset, " New Offset: ", startOfFileDirectoryEntry , " at ", startOfCentralDirectoryEntry
    #now replace it
    magicdata = magicdata[:startOfCentralDirectoryEntry] + pack('I', startOfFileDirectoryEntry) + magicdata[startOfCentralDirectoryEntry+4:]

    #now I move to the next central directory entry, and the next file entry
    startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', startOfCentralDirectoryEntry)
    startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', startOfFileDirectoryEntry+1)

#Finally write the rewritten headers' data
towrite = open(magicfilename, 'wb')
towrite.write(magicdata)
towrite.close()

回复收藏 0 原文

月朦胧 2024-08-20 03:55:20

您可以使用 DotNetZip 生成混合 JPG+ZIP 文件。 DotNetZip 可以保存到流中，并且它足够智能，可以在开始将 zip 内容写入其中之前识别预先存在的流的原始偏移量。因此，在伪代码中，您可以通过以下方式获得 JPG+ZIP：

 open stream on an existing JPG file for update
 seek to the end of that stream
 open or create a zip file
 call ZipFile.Save to write zip content to the JPG stream
 close

所有偏移量均已正确计算。使用相同的技术来生成自解压存档。您可以在 EXE 上打开流，然后查找到末尾，并将 ZIP 内容写入该流。如果您这样做，所有偏移量都会正确计算。

另一件事 - 关于另一篇文章中的一条评论...ZIP 可以在文件的开头和包含任意数据。据我所知，没有要求 zip 中央目录必须位于文件末尾，尽管这是典型的情况。

You can produce hybrid JPG+ZIP files using DotNetZip. DotNetZip can save to a stream, and it is intelligent enough to recognize the original offset of a pre-existing stream before it begins writing zip content into it. Therefore in pseudo code, you can get a JPG+ZIP this way:

 open stream on an existing JPG file for update
 seek to the end of that stream
 open or create a zip file
 call ZipFile.Save to write zip content to the JPG stream
 close

All the offsets are correctly figured. The same technique is used to produce a self-extracting archive. You can open the stream on the EXE, then seek to the end, and write the ZIP content to that stream. All the offsets are correctly calculated if you do it this way.

Another thing - regarding one of the comments in another post... ZIP can have arbitrary data in the beginning and at the end of the file. There's no requirement as far as I know that the zip central directory needs to be at the end of the file, though that is typical.

回复收藏 0 原文

~没有更多了~