为什么我的 java.util.zip 函数显示不一致的行为？

发布于 2024-10-15 08:22:23 字数 282 浏览 7 评论 0原文

我有一个 Java 应用程序，它使用 java.util.zip 库来压缩和解压缩文件。我所拥有的是服务器上的一个 zip 文件（由我的应用程序创建），客户端压缩他的一些文件并将文件上传到服务器，但如果底层文件没有区别，那么我不想浪费时间上传。我想我可以计算客户端和服务器端的 MD5 哈希值，看看它们是否相同，但发生的情况是我使用我的应用程序解压缩 zip 文件，然后不更改任何底层文件，我使用我的应用程序重新压缩它，但新旧 zip 文件具有不同的 MD5 哈希值。有谁知道为什么会发生这种情况，以及是否有更好的方法来比较两个 zip 文件？谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那片花海 2024-10-22 08:22:23

我认为更糟糕的是：

两次执行相同的 zip 操作可能会产生两个不同的 zip 存档：

> zip some.zip some.txt 
  adding: some.txt (stored 0%)
> zip other.zip some.txt
  adding: some.txt (stored 0%)
> ll
total 24
-rw-r--r--  1 cthies  staff  170 12 Dez 18:01 other.zip
-rw-r--r--  1 cthies  staff    4 12 Dez 18:01 some.txt
-rw-r--r--  1 cthies  staff  170 12 Dez 18:01 some.zip
> md5 *.zip
MD5 (other.zip) = f56d7753c5af78427274d930b9fb8c90
MD5 (some.zip) = e2f0382c4ad31871f62fb559157df8e8

查看二进制文件，人们只能在一个地方看到差异：

> xxd some.zip > some.xxd
> xxd other.zip > other.xxd
> colordiff *.xxd
3c3
< 0000020: 6d65 2e74 7874 5554 0900 0363 33e6 4e78  me.txtUT...c3.Nx
---
> 0000020: 6d65 2e74 7874 5554 0900 0363 33e6 4e64  me.txtUT...c3.Nd

我认为（取决于 zip 应用程序本身）当前系统时间可以/将会涉及。因此，任何 zip 操作（在完全相同的源上）都可以（！）是唯一的，因此不能假设校验和相等。

我发现的与时间无关的工具：tar、7z。（都是命令行）
即 tar 和 7z 复制具有相同校验和 (md5) 的存档。

（在 OSX 10.6.8 上使用命令行 zip 实用程序进行测试）

It's even worse, I think:

Doing the same zip-operation twice can result in two different zip-archives:

> zip some.zip some.txt 
  adding: some.txt (stored 0%)
> zip other.zip some.txt
  adding: some.txt (stored 0%)
> ll
total 24
-rw-r--r--  1 cthies  staff  170 12 Dez 18:01 other.zip
-rw-r--r--  1 cthies  staff    4 12 Dez 18:01 some.txt
-rw-r--r--  1 cthies  staff  170 12 Dez 18:01 some.zip
> md5 *.zip
MD5 (other.zip) = f56d7753c5af78427274d930b9fb8c90
MD5 (some.zip) = e2f0382c4ad31871f62fb559157df8e8

Looking in the binaries, one can see difference in just one place:

> xxd some.zip > some.xxd
> xxd other.zip > other.xxd
> colordiff *.xxd
3c3
< 0000020: 6d65 2e74 7874 5554 0900 0363 33e6 4e78  me.txtUT...c3.Nx
---
> 0000020: 6d65 2e74 7874 5554 0900 0363 33e6 4e64  me.txtUT...c3.Nd

I think (depending on the zip-app itself) the current system time can/will be involved. Thus any zip-operation - on exactly the same sources - can(!) be unique and therefore the checksums can't be assumed equal.

Time-independent tools I found: tar, 7z. (both command-line)
I.e. tar and 7z reproduces archives with equal checksums (md5).

(tested on OSX 10.6.8 with command-line zip utility)

回复收藏 0 原文