160 位 SHA1 哈希值的前 32 位是否可以替代 CRC32 哈希值?
我正在开发一个 .NET 3.5 项目,我需要一个 32 位哈希值。 .NET Cryptography 类中似乎没有任何方法返回 32 位哈希(MD5 是 128 位,SHA1 是 160 位等)。 我实现了一个 CRC32 类,但我发现现有的 SHA1 和 MD5 哈希函数要快得多。
如果我使用 SHA1 散列函数并只截断前 32 位来存储为我的散列值,是否会出现任何问题(即增加冲突的可能性)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
除非您想要 CRC32 的额外功能(作为线性代码),否则您应该可以将输出削减为 32 位。
切割某些加密哈希函数的输出是否会损害其抗碰撞安全性是一个开放的研究问题(如果我没记错的话,存在“不自然”构造的示例)。 但 NIST(可能得到了 NSA 的批准)无论如何都使用了切割技术从 SHA-256 中获取 SHA-224(请参阅 维基百科中有关 SHA 的文章)。
编辑:CRC32 允许检测(并且可能纠正)单个位错误,而加密哈希函数应该具有这样的属性:您无法找到具有相同哈希值的两个输入。
你知道“生日悖论”吗(再次参见维基百科)? 使用 32 位校验和,当您有大约 2^16 个输入并且您想要对更多输入进行哈希处理时,您预计会发生冲突(即,两个输入具有相同的哈希值)。 (重读您的评论,这对您来说可能不是问题。)
Unless you want the extra features of the CRC32 (being a linear code), you should be fine with cutting the output to 32 bit.
Whether cutting the output of some cryptographic hash-functions hurts its security with respect to collision resistant is an open research problem ("unnatural" constructed examples exist if I remember correctly). But NIST (probably with the approval of the NSA) used the cutting technique to get the SHA-224 from SHA-256 anyway (see article about SHA in wikipedia).
EDIT: the CRC32 allows to detect (and maybe correct) single bit errors, whereas a cryptographic hash function should have the property that you can't find two inputs that have the same hash value.
Are you aware of the "birthday paradox" (see again wikipedia)? With an 32-bit checksum you expect to get a collision (i.e., two inputs with the same hash value) when you have about 2^16 inputs, and you want to hash many more inputs. (Rereading your comment this might not be a problem for you.)
假设哈希函数将其输入均匀地分布在其共域上,那么假设它也将均匀地分布在其任何子集上似乎是合乎逻辑的。
然而,使用“原生”32 位哈希函数可能仍然是更好的选择。 也许更深入地了解此事的人可以为我们提供比我的直觉更好的理由:)
Given the assumption that a hash function distributes its inputs equally over its codomain, it seems logical to assume that it will also distribute equally over any subset of it.
However, using a "native" 32bit hash function will probably still be the better choice. Maybe someone more into the matter can provide us with a better reason than just my gut feeling :)
为什么不直接使用 string.GetHashCode() 呢? 它旨在计算 32 位哈希值,并在给定真实数据的情况下产生很少的冲突。 当然,它不安全,但您的问题不包括这一要求。
Why don't you just use string.GetHashCode(). It is designed to compute a 32-bit hash value and produce few collisions given real-world data. Of course, it's not secure, but your question doesn't include that as a requirement.
如果您不打算将 32 位用于加密目的,那么应该没问题。 否则,我不会依赖与整个散列具有相同分布的前 32 位。
为什么不能使用可用的更广泛的哈希值?
If you are not intending to use the 32-bits for a cryptographic purpose then you should be OK. Otherwise, I wouldn't rely on the first 32-bits having the same distribution as the whole hash.
Why can't you just use the wider hash that's available?
CRC32 可能适合您的需求。 此问题对此进行了讨论。
就截断哈希原语而言,唯一频繁使用的应用程序是 SSL /TLS 伪随机函数 (PRF) 用于生成密钥。 它使用 HMAC、种子和标签,通过多次散列然后截断到您需要的字节数来生成您需要的任意数量的字节。
至于你的具体问题,如果你偏执的话,你可以将散列的输出读入 Int32,然后将它们异或在一起:
CRC32 is probably reasonable for your needs. This has been discussed in this question.
In terms of truncating a hash primitive, the only heavily used application of this is the SSL/TLS Pseudo Random Function (PRF) which is used to generate keys. It uses HMAC's, seeds, and labels to generate as many bytes as you need by hashing several times and then truncating to the amount of bytes you need.
As to your specific question though, you could read the output of the hash into Int32's and then xor them together if you're paranoid: