是否有任何外部工具可以为 gzip -lv 提供匹配的校验和?

发布于 2024-10-07 05:23:11 字数 1003 浏览 11 评论 0原文

当给定 -l 和 -v 选项时,gzip 将给出每个文件的校验和,如下所示:

$ echo foo > foo
$ gzip foo
$ gzip -lv foo.gz
method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
defla 7e3265a8 Dec 10 17:37                  28                   4 150.0% foo

是否有任何外部工具可以用来导出相同的校验和?

md5sumcksumsum 起到类似的作用,但不给出匹配代码(3915528286 的十六进制为 e962385e)。

$ echo foo > foo
$ md5sum foo
d3b07384d113edec49eaa6238ad5ff00  foo
$ cksum foo
3915528286 4 foo
$ sum foo
00106     1




有关应用程序的额外详细信息:


我们有一个包含许多大文件的文件系统,并且不断复制新文件。一些传入的文件与已经存在的文件匹配,在这种情况下,我们希望简单地硬链接预先存在的文件,以节省磁盘空间。对于解压缩的文件,md5sum 可以帮助我们快速有效地进行比较。另一方面,gzip 压缩的文件对于相同的数据通常具有不同的 md5sum(由于时间戳或所有者,这与此应用程序无关)。我注意到 gzip 将为内部数据提供校验和,因此对于两个 gzip 文件,我可以简单地比较校验和列表以及大小。

我还想支持将 gzip 文件与“普通”文件进行比较,在这种情况下,我需要一个实用程序来从 gzip 外部生成相同的校验和。我想简单的解决方案是在比较之前始终对纯文件进行 gzip,但这是我想避免的开销,因为我们的系统目前受到 CPU 时间的瓶颈。

gzip will give a checksum of each file when given -l and -v options, like so:

$ echo foo > foo
$ gzip foo
$ gzip -lv foo.gz
method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
defla 7e3265a8 Dec 10 17:37                  28                   4 150.0% foo

Is there any external tool with which I can derive the same checksum?

md5sum, cksum and sum fill a similar role, but do not give the matching code ( hex of 3915528286 is e962385e).

$ echo foo > foo
$ md5sum foo
d3b07384d113edec49eaa6238ad5ff00  foo
$ cksum foo
3915528286 4 foo
$ sum foo
00106     1


Extra detail on the application:


We have a filesystem with many large files, and new files are copied in continually. Some of the incoming files match files already existing, in which case we'd like to simply hard-link the pre-existing file, to save disk space. For unzipped files, md5sum's help us make this comparison quickly and efficiently. On the other hand, gzip'd files often have different md5sum for identical data (due to timestamp or owner, which is irrelevant in this application). I notice that gzip will provide a checksum for the internal data, so for two gzip'd files I can simply compare the lists of checksums plus sizes.

I'd also like to support comparing a gzip file to a 'normal' file, in which case I need a utility which will generate the same checksum externally from gzip. I guess the simple solution is to always gzip the plain file before comparing, but this is overhead that I'd like to avoid, since our system is currently bottlenecked by CPU time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时光沙漏 2024-10-14 05:23:11

长话短说,我浏览了 gzip 的源代码,将其与 cksum 的源代码进行了比较,做了一些修改,然后发现 jacksum 使用与 gzip 相同的实现。

所以使用jacksum。 :)

调用:jacksum -a crc32 文件名

Long story short, I went through the source of gzip, compared it to the source of cksum, made some modifications and then found out that jacksum uses the same implementation as gzip.

So use jacksum. :)

invocation: jacksum -a crc32 filename

泪之魂 2024-10-14 05:23:11

我刚刚运行了一些基准测试,虽然 jacksum 相当不错,但它比 cksfv 花费的时间稍长并且使用更多的内存。

该基准测试是在 VirtualBox Ubuntu VM 上对 cat /dev/urandom 生成的 4GB 文件执行的。您可能会在“真实”机器上获得更好的速度,但它们应该处于相同的比率。

gzip/tempfile 方法耗尽了磁盘空间,但我不在乎,因为它已经使用了两倍以上的时间。

$ cksum random.dat
1591530146 4388388864 random.dat
5.78user 7.42system 2:53.62elapsed 7%CPU (0avgtext+0avgdata 2896maxresident)k
8480936inputs+0outputs (0major+225minor)pagefaults 0swaps

$ md5sum random.dat
3d6f60f84b2289992abd66428e8a73c4  random.dat
5.57user 8.25system 2:25.97elapsed 9%CPU (0avgtext+0avgdata 2656maxresident)k
8480960inputs+0outputs (1major+209minor)pagefaults 0swaps

$ jacksum -x -a crc32 random.dat
c93b4e20        4388388864      random.dat
3.65user 10.82system 2:19.69elapsed 10%CPU (0avgtext+0avgdata 52224maxresident)k
8490688inputs+152outputs (60major+3936minor)pagefaults 0swaps

$ cksfv random.dat
; Generated by cksfv v1.3.14 on 2010-12-11 at 12:06.31
; Project web site: http://www.iki.fi/shd/foss/cksfv/
;
;     93421568  11:16.12 2010-12-11 random.dat
random.dat C93B4E20
4.42user 8.65system 2:14.42elapsed 9%CPU (0avgtext+0avgdata 2048maxresident)k
8480944inputs+0outputs (1major+171minor)pagefaults 0swaps

$ bash -c gzip -c random.dat > temp.gz && gzip -lv temp.gz

gzip: stdout: No space left on device
Command exited with non-zero status 1
55.54user 6.68system 4:31.56elapsed 22%CPU (0avgtext+0avgdata 4992maxresident)k
2596536inputs+2689840outputs (3major+695minor)pagefaults 0swaps

我认为 cksfv 是我的答案。

I just ran some benchmarks, and while jacksum is pretty good, it takes slightly longer and uses much more memory than cksfv.

This benchmark was performed in a VirtualBox Ubuntu VM on on a four-gig file generated by cat /dev/urandom. You will probably get much better speeds on a "real" machine, but they should be in the same ratio.

The gzip/tempfile method ran out of disk space, but I don't care because it had already used more than twice as much time.

$ cksum random.dat
1591530146 4388388864 random.dat
5.78user 7.42system 2:53.62elapsed 7%CPU (0avgtext+0avgdata 2896maxresident)k
8480936inputs+0outputs (0major+225minor)pagefaults 0swaps

$ md5sum random.dat
3d6f60f84b2289992abd66428e8a73c4  random.dat
5.57user 8.25system 2:25.97elapsed 9%CPU (0avgtext+0avgdata 2656maxresident)k
8480960inputs+0outputs (1major+209minor)pagefaults 0swaps

$ jacksum -x -a crc32 random.dat
c93b4e20        4388388864      random.dat
3.65user 10.82system 2:19.69elapsed 10%CPU (0avgtext+0avgdata 52224maxresident)k
8490688inputs+152outputs (60major+3936minor)pagefaults 0swaps

$ cksfv random.dat
; Generated by cksfv v1.3.14 on 2010-12-11 at 12:06.31
; Project web site: http://www.iki.fi/shd/foss/cksfv/
;
;     93421568  11:16.12 2010-12-11 random.dat
random.dat C93B4E20
4.42user 8.65system 2:14.42elapsed 9%CPU (0avgtext+0avgdata 2048maxresident)k
8480944inputs+0outputs (1major+171minor)pagefaults 0swaps

$ bash -c gzip -c random.dat > temp.gz && gzip -lv temp.gz

gzip: stdout: No space left on device
Command exited with non-zero status 1
55.54user 6.68system 4:31.56elapsed 22%CPU (0avgtext+0avgdata 4992maxresident)k
2596536inputs+2689840outputs (3major+695minor)pagefaults 0swaps

I think cksfv is my answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文