adler32 校验和的可靠性如何?
我想知道与 md5 校验和相比,adler32 校验和有多可靠?维基百科上说 adler32 比 md5“可靠得多”,所以我想知道可靠程度如何以及以何种方式?
更具体地说,我想知道它是否足够可靠,作为对大小为 20GB 以上的(tar)文件的长期归档的一致性检查?
I wonder how reliable the adler32 checksum is, compared to e.g. md5 checksums? It was told on wikipedia that adler32 is "much less reliable" than md5, so I wonder how much, and in which way?
More specifically, I'm wondering if it is reliable enough as a consistency check for long-time archiving of (tar) files of size 20GB+?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
有关 Adler-32 校验和的错误检查功能的详细信息,请参阅示例 重新访问 Fletcher 和 Adler 校验和。 Maxino,2006。
本文包含对这两个校验和提供的汉明距离的分析,并提供了高达约 2^11 位的数据字的残余错误率的指示。这显然比您所要求的 2^38 位要少得多......
For details on the error-checking capabilities of the Adler-32 checksum, see for example Revisiting Fletcher and Adler Checksums. Maxino, 2006.
This paper contains an analysis on the Hamming distance provided by these two checksums, and provides an indication of the residual error rate for data words up to about 2^11 bits. Which, obviously is much less than your requirement of 2^38 bits...
Adler32 的用途与 MD5 完全不同。 Adler32 是一个校验和。 MD5 是一种安全消息摘要。 Adler32 用于快速哈希,具有较小的位空间和简单的算法。它的碰撞率很低,但还不足以保证安全。 MD5、SHA 和其他加密/安全哈希(或消息摘要)具有更大的位空间和更复杂的算法,因此冲突更少。例如,比较 SHA2-256; 256 位,而 Adler32 的 32 位微不足道。
Adler 确实有其用途,例如哈希表或快速数据完整性检查。尽管如此,它的设计目的与 MD5 或其他安全摘要不同。
顺便说一句,如果您需要一个简单但有些可靠的校验和,那么弗莱彻似乎胜过阿德勒。我推测它们的性能都优于 CRC,尽管可能不是基于简单加法的校验和(尽管它很容易发生冲突)。如果您既想要性能又想要安全性,那么就使用两种算法。使用校验和算法进行快速计算和查找,然后使用较大的摘要进行更彻底的确认(如果找到)。
为了回答你关于确保档案有效性的问题,我想说这可能就足够了。最好的选择?有疑问。出错的可能性?非常低。
Adler32 has an entirely different purpose than MD5. Adler32 is a checksum. MD5 is a secure message digest. Adler32 is for quick hashes, has a small bit space, and simple algorithm. Its collision rate is low, but not low enough to be secure. MD5, SHA, and other cryptographic/secure hashes (or message digests) have much larger bitspaces and more complex algorithms, thus have far fewer collisions. Compare SHA2-256, for example; 256 bits compared to Adler32's measly 32 bits.
Adler does have its purpose, in hash tables for instance, or rapid data integrity checks. Still, it is not designed with the same purpose as MD5 or other secure digests.
BTW, if a simple but somewhat reliable checksum is what you need, then it seems Fletcher out-performs Adler. I'd speculate they both out-perform CRC, though perhaps not a simple addition based checksum (though it is very prone to collisions). If you want BOTH performance AND security, then use BOTH algorithms. Have the checksum algorithm used as a quick calculation and lookup, then use the larger digest for a more thorough confirmation if found.
To answer your question on ensuring the validity of archives, I would say that it would probably suffice just fine. Best choice? Questionable. Possibility of error? Very low.
这是一种古老算法;正如维基百科页面所说,“以准确性换取速度”。简而言之,不,您不应该依赖它。
关键是,在多次损坏的情况下,该校验和可能仍然通过“正常”。由于雪崩效应,这种情况在现代算法(甚至是旧的 MD5)中发生的可能性要小得多。
对于今天的机器,速度并不是那么重要,因此我建议使用现代算法(以当前为准),即使对于 TB 范围内的文件也是如此。恕我直言,使用旧的校验和系统所节省的时间微不足道,不足以平衡未检测到的数据损坏的显着增加的风险 - 老实说,20GB 的文件并不是当今您所需要的那么多数据。需要使用弱(而且我敢说是损坏的)算法。
This is an ancient algorithm; one which, as the Wikipedia page says, "trades accuracy for speed". In short, no, you shouldn't rely on it.
The point is that with multiple corruptions, this checksum might still pass as "okay". Due to the avalanche effect, this is significantly less likely to occur in modern algorithms (even the old MD5).
For today's machines, speed is not so much of a concern, therefore I'd suggest using a modern algorithm (whichever is current), even for files in the TB range. The insignificant time savings you'd get with an old checksum system are IMHO not enough to balance the significantly increased risk of undetected data corruption - and honestly, 20GB of files is not that much data these days that you'd need to use weak (and I daresay broken) algorithms.
它的可靠性不如 MD5 或 CRC(实际上与 CRC 大致相同)。优点是速度,缺点是对于短数据(几百字节)更明显 - 这意味着散列值的分布不能很好地覆盖可用的 32 位输出。对于大文件来说这是一个不错的选择。
It is less reliable than say MD5 or CRC (about the same as CRC actually). Advantage is speed, disadvantage is more showing for short data (few hundred bytes) - the meaning is that the distribution of hash values does not cover very well the available 32bit output. For big files it is a good choice.
Adler-32 和 MD5 在这方面没有可比性。当您想要确保文件没有被对手篡改时,MD5 实际上旨在成为一种加密校验和,而 Adler-32(以及 CRC,是< /em> 与 Adler-32 相当)旨在确保文件没有被意外篡改(完整性校验和)。MD5
实际上因其加密目的而被视为已损坏,并且现在只有当您需要更多位来确定确定性时,才可用作完整性检查。 Adler-32“不太可靠”的唯一方法是,它允许在保留相同输出的同时更改更多位,这意味着有更多的冲突空间。
此链接很好地讨论了如何使用Adler-32 可以为某些需要使用加密和来增加确定性的代码提供性能优势。也就是说,您可以使用较小且便宜的校验和来查看在文件发生更改时是否值得考虑使用更昂贵的 MD5/SHA/Whirlpool。
Adler-32 and MD5 are not comparable in this way. MD5 is actually intended to be a cryptographic checksum when you want to make sure that a file hasn't been tampered with by an adversary, while Adler-32 (and also CRC, which is comparable to Adler-32) is intended for making sure a file hasn't been tampered with by accident (integrity checksum.)
MD5 is actually considered broken for its cryptographic purposes, and is only useful now as an integrity check when you want more bits for certainty. The only way Adler-32 can be "less reliable" is that it allows potentially more bits to be altered while retaining the same output, which means there is more room for collisions.
This link gives a good discussion as to how using Adler-32 can provide performance benefits for some kinds of code which needs to use cryptographic sums for added certainty. Namely, that you can use the smaller and cheap checksum to see if doing the more expensive MD5/SHA/Whirlpool is worth considering in the event of changed files.