是否有任何外部工具可以为 gzip -lv 提供匹配的校验和?
当给定 -l 和 -v 选项时,gzip 将给出每个文件的校验和,如下所示:
$ echo foo > foo
$ gzip foo
$ gzip -lv foo.gz
method crc date time compressed uncompressed ratio uncompressed_name
defla 7e3265a8 Dec 10 17:37 28 4 150.0% foo
是否有任何外部工具可以用来导出相同的校验和?
md5sum
、cksum
和 sum
起到类似的作用,但不给出匹配代码(3915528286 的十六进制为 e962385e)。
$ echo foo > foo
$ md5sum foo
d3b07384d113edec49eaa6238ad5ff00 foo
$ cksum foo
3915528286 4 foo
$ sum foo
00106 1
有关应用程序的额外详细信息:
我们有一个包含许多大文件的文件系统,并且不断复制新文件。一些传入的文件与已经存在的文件匹配,在这种情况下,我们希望简单地硬链接预先存在的文件,以节省磁盘空间。对于解压缩的文件,md5sum 可以帮助我们快速有效地进行比较。另一方面,gzip 压缩的文件对于相同的数据通常具有不同的 md5sum(由于时间戳或所有者,这与此应用程序无关)。我注意到 gzip 将为内部数据提供校验和,因此对于两个 gzip 文件,我可以简单地比较校验和列表以及大小。
我还想支持将 gzip 文件与“普通”文件进行比较,在这种情况下,我需要一个实用程序来从 gzip 外部生成相同的校验和。我想简单的解决方案是在比较之前始终对纯文件进行 gzip,但这是我想避免的开销,因为我们的系统目前受到 CPU 时间的瓶颈。
gzip will give a checksum of each file when given -l and -v options, like so:
$ echo foo > foo
$ gzip foo
$ gzip -lv foo.gz
method crc date time compressed uncompressed ratio uncompressed_name
defla 7e3265a8 Dec 10 17:37 28 4 150.0% foo
Is there any external tool with which I can derive the same checksum?
md5sum
, cksum
and sum
fill a similar role, but do not give the matching code ( hex of 3915528286 is e962385e).
$ echo foo > foo
$ md5sum foo
d3b07384d113edec49eaa6238ad5ff00 foo
$ cksum foo
3915528286 4 foo
$ sum foo
00106 1
Extra detail on the application:
We have a filesystem with many large files, and new files are copied in continually. Some of the incoming files match files already existing, in which case we'd like to simply hard-link the pre-existing file, to save disk space. For unzipped files, md5sum's help us make this comparison quickly and efficiently. On the other hand, gzip'd files often have different md5sum for identical data (due to timestamp or owner, which is irrelevant in this application). I notice that gzip will provide a checksum for the internal data, so for two gzip'd files I can simply compare the lists of checksums plus sizes.
I'd also like to support comparing a gzip file to a 'normal' file, in which case I need a utility which will generate the same checksum externally from gzip. I guess the simple solution is to always gzip the plain file before comparing, but this is overhead that I'd like to avoid, since our system is currently bottlenecked by CPU time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
长话短说,我浏览了 gzip 的源代码,将其与 cksum 的源代码进行了比较,做了一些修改,然后发现 jacksum 使用与 gzip 相同的实现。
所以使用jacksum。 :)
调用:jacksum -a crc32 文件名
Long story short, I went through the source of gzip, compared it to the source of cksum, made some modifications and then found out that jacksum uses the same implementation as gzip.
So use jacksum. :)
invocation: jacksum -a crc32 filename
我刚刚运行了一些基准测试,虽然 jacksum 相当不错,但它比 cksfv 花费的时间稍长并且使用更多的内存。
该基准测试是在 VirtualBox Ubuntu VM 上对
cat /dev/urandom
生成的 4GB 文件执行的。您可能会在“真实”机器上获得更好的速度,但它们应该处于相同的比率。gzip/tempfile 方法耗尽了磁盘空间,但我不在乎,因为它已经使用了两倍以上的时间。
我认为 cksfv 是我的答案。
I just ran some benchmarks, and while jacksum is pretty good, it takes slightly longer and uses much more memory than cksfv.
This benchmark was performed in a VirtualBox Ubuntu VM on on a four-gig file generated by
cat /dev/urandom
. You will probably get much better speeds on a "real" machine, but they should be in the same ratio.The gzip/tempfile method ran out of disk space, but I don't care because it had already used more than twice as much time.
I think cksfv is my answer.