哈希函数组合——碰撞风险是否显着降低?
有谁知道通过组合哈希函数来降低冲突概率是否有真正的好处?我特别需要了解 32 位哈希,即结合 Adler32 和 CRC32。 基本上,adler32(crc32(data)) 产生的碰撞概率会比 crc32(data) 更小吗? 最后一条评论此处给出了一些支持合并的测试结果,但没有提及来源。 就我的目的而言,碰撞并不重要(即任务不涉及安全),但如果可能的话,我宁愿尽量减少概率。 PS:我刚刚开始进入奇妙的哈希世界,并阅读了大量相关内容。抱歉,如果我问了一个愚蠢的问题,我什至还没有获得正确的“哈希方言”,可能我对此的谷歌搜索也很糟糕。 谢谢。
Does anyone know if there's a real benefit regarding decreasing collision probability by combining hash functions? I especially need to know this regarding 32 bit hashing, namely combining Adler32 and CRC32.
Basically, will adler32(crc32(data)) yield a smaller collision probability than crc32(data)?
The last comment here gives some test results in favor of combining, but no source is mentioned.
For my purpose, collision is not critical (i.e. the task does not involve security), but I'd rather minimize the probability anyway, if possible.
PS: I'm just starting in the wonderful world of hashing, doing a lot of reading about it. Sorry if I asked a silly question, I haven't even acquired the proper "hash dialect" yet, probably my Google searches regarding this were also poorly formed.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
像这样将它们串联起来是没有意义的。您正在将一个 32 位空间散列到另一个 32 位空间。
在第一步发生crc32碰撞的情况下,最终的结果仍然是碰撞。然后在 adler32 步骤中添加任何潜在的冲突。所以它不可能变得更好,只能是一样或更糟。
为了减少冲突,您可以尝试独立使用两个哈希值来创建 64 位输出空间:
adler32(data) << 32 | 32 crc32(data)
这样做是否有显着的好处,我不确定。
请注意,您提到的原始评论是独立存储哈希值:
This doesn't make sense combining them in series like that. You are hashing one 32-bit space to another 32-bit space.
In the case of a crc32 collision in the first step, the final result is still a collision. Then you add on any potential collisions in the adler32 step. So it can not get any better, and can only be the same or worse.
To reduce collisions, you might try something like using the two hashes independently to create a 64-bit output space:
adler32(data) << 32 | crc32(data)
Whether there is significant benefit in doing that, I'm not sure.
Note that the original comment you referred to was storing the hashes independently: