MD5 将 4 字节和 8 字节密钥哈希为 16 字节值;发生碰撞的可能性有多大?
我有 232 4 字节密钥正在进行哈希处理;碰撞的几率有多大?
如果我有 264 8 字节密钥(并不是真正存储每个密钥,但我想知道最坏的情况)怎么办?
I have 232 4-byte keys that I'm hashing; what's the chance of collision?
What if I have 264 8-byte keys (not really storing every key, but I want to know the worst case)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据有关生日问题的维基百科页面,可以通过 <代码>1-e^(-(n^2)/d)。将其绘制为您的值可以得到 这张图(对数水平轴,我放大了概率开始的位置长钉)。请注意,这只是一个近似值,应保守地考虑(即实际概率可能稍高,但应该在正确的范围内)。
Per the wikipedia page on the Birthday Problem, a good first order approximation can be found with
1-e^(-(n^2)/d)
. Graphing this for your values gives this graph (logarithmic horizontal axis, I've zoomed in on where the probability starts to spike). Note that this is only an approximation, and should be considered conservatively (ie, the real probability may be somewhat higher, but it should be in the right ballpark).你用哈希码做什么?如果您使用它们来确定两条数据是否相同,则 MD5 哈希值非常好,但前提是您使用的数据不是由恶意实体创建的。 (加密目的需要更好的散列算法,正是为了处理“恶意攻击者”问题。)
如果您使用它们来构建映射(即,您正在构建散列表),通常最好使用便宜的散列并想出一种方法来减轻冲突的成本(例如,通过将链表挂在哈希表之外,并在平均权重变得太大时调整大小/重建)。
What are you doing with the hash codes? If you're using them to work out whether two pieces of data are the same, an MD5 hash is pretty good, though only if you are working with data that is not being created by malicious entities. (Cryptographic purposes need better hash algorithms precisely in order to deal with the "malicious attacker" problem.)
If you're using them for building a map (i.e., you're building a hash table) it's usually better to use a cheap hash and come up with a way to mitigate the costs of collision (e.g., by hanging a linked list off the hash table and resizing/rebuilding when the average weight gets too large).