CRC32+大小与 MD5/SHA1
我们有一个文件存储,该存储根据附加到 crc32 的大小唯一标识一个文件。
我想知道这个校验和( crc32 + size )是否足以识别文件,或者我们是否应该考虑其他一些哈希技术,例如 MD5/SHA1?
We have a storage of files and the storage uniquely identifies a file on the basis of size appended to crc32.
I wanted to know if this checksum ( crc32 + size ) would be good enough for identifying files or should we consider some other hashing technique like MD5/SHA1?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
CRC 与其说是一种严格的哈希函数,不如说是一种错误检测方法。它有助于识别损坏的文件,而不是唯一地识别它们。
所以你的选择应该在 MD5 和 SHA1 之间。
如果您没有很强的安全需求,您可以选择更快的 MD5。
(请记住,MD5 很容易受到碰撞攻击)。
如果您需要更高的安全性,最好使用 SHA1 甚至 SHA2 。
CRC is most an error detection method than a serious hash function. It helps in identify corrupting files rather than uniquely identify them.
So your choice should be between MD5 and SHA1.
If you don't have strong security needings you can choose MD5 that should be faster.
(remember that MD5 is vulnerable to collision attacks).
If you need more security you better use SHA1 or even SHA2 .
CRC-32 不够好;构建冲突很简单,即两个文件(如果您希望的话,长度相同)具有相同的 CRC-32。即使没有恶意攻击者,一旦拥有大约 65000 个相同长度的不同文件,冲突就会随机发生。
哈希函数旨在避免冲突。使用 MD5 或 SHA-1,您将不会遇到随机冲突。如果您的设置与安全相关(即某人在某个地方可能会主动尝试创建冲突),那么您需要一个安全哈希函数。 MD5 不再安全(使用 MD5 创建冲突很容易),而 SHA-1 在这方面有些薄弱(没有计算实际冲突,但创建冲突的方法是已知的,虽然昂贵,但比 SHA-1 便宜得多)应该是这样)。通常的建议是使用 SHA-256 或 SHA-512(SHA-256 足以保证安全性;SHA-512 在大型 64 位系统上可能会快一点,但文件读取带宽将比哈希速度更受限制) 。
注意:使用加密哈希函数时,不需要存储和比较文件长度;哈希足以消除文件的歧义。
在非安全设置中(即您只担心随机冲突),则 MD4 可以被使用。它作为加密哈希函数被彻底“破坏”,但它仍然是一个非常好的校验和,而且速度非常快(在某些基于 ARM 的平台上,它甚至比 CRC-32 更快,具有更好的抗随机性)碰撞)。基本上,您不应该使用 MD5:如果您有安全问题,则不得使用 MD5(它已损坏;使用 SHA-256);如果您没有安全问题,那么 MD4 比 MD5 更快。
CRC-32 is not good enough; it is trivial to build collisions, i.e. two files (of the same length if you wish it so) which have the same CRC-32. Even in the absence of a malicious attacker, collisions will happen randomly once you have about 65000 distinct files with the same length.
A hash function is designed to avoid collisions. With MD5 or SHA-1, you will not get random collisions. If your setup is security-related (i.e. there is someone, somewhere, who may actively try to create collisions), then you need a secure hash function. MD5 is not secure anymore (creating collisions with MD5 is easy) and SHA-1 is somewhat weak in that respect (no actual collisions were computed, but a method for creating one is known and, while expensive, it is much less expensive than what it ought to be). The usual recommendation is to use SHA-256 or SHA-512 (SHA-256 is enough for security; SHA-512 may be a tad faster on big, 64-bit systems, but file reading bandwidth will be more limitating than hashing speed).
Note: when using a cryptographic hash function, there is no need to store and compare the file lengths; the hash is sufficient to disambiguate files.
In a non-security setup (i.e. you only fear random collisions), then MD4 can be used. It is thoroughly "broken" as a cryptographic hash function, but it still is a very good checksum, and it is really fast (on some ARM-based platforms, it is even faster than CRC-32, for a much better resistance to random collisions). Basically, you should not use MD5: if you have security issues, then MD5 must not be used (it is broken; use SHA-256); and if you do not have security issues then MD4 is faster than MD5.
CRC32+size 使用的空间为您提供了足够的空间来容纳更大的 CRC,这将是更好的选择。如果您不担心恶意碰撞,那么托马斯的回答就适用。
您没有指定语言,但例如在 C++ 中,您得到 Boost CRC 为您提供所需大小的 CRC(或者您有能力存储)。
The space that would be used by a CRC32+size gives you enough room for a bigger CRC which would be a much better choice. If you are not worried about malicious collision that's it in which case Thomas' answer applies.
You didn't specify a language but for example in C++ you got Boost CRC giving you CRC of the size you want (or you can afford to store).
正如其他人所说,CRC 不保证不发生冲突。但是,只需给文件提供递增的 64 位数字即可解决您的问题。这保证永远不会发生冲突(除非您想将大量文件保留在一个目录中,这无论如何都不是一个好主意)。
As others have said, CRC doesn't guarantee absence of collisions. However, your problem is be solved simply by giving the files incrementing 64-bit numbers. This is guaranteed to never collide (unless you want to keep gazillion of files in one directory which is not a good idea anyway).
老问题,但在 Google 上仍然排名很高,因此它值得一个现代答案:
如果您想要一个适合识别文件的非加密哈希,比 CRC32 更好但更快相比 MD5,我强烈推荐 xxHash 系列。
除了命令行工具之外,它还拥有多种语言的库,包括 C、Java、Python 等。
Old question, but still highly ranked on Google, so it deserves a modern answer:
If you want a non-cryptographic hash that's suitable for identifying files, better than CRC32 but faster than MD5, I truly recommend the xxHash family.
Besides the command-line tools it also has libraries for several languages, including C, Java, Python, etc.