在遇到 MD5 哈希冲突之前,我必须计数到多少?
别管我为什么这么做——这主要是理论上的。
如果我对整数的 MD5 字符串表示进行散列,那么在两个散列发生冲突之前我必须计数到多高?
Never mind why I'm doing this -- this is mainly theoretical.
If I were MD5 hashing string representations of integers, how high would I have to count before two of the hashes collide?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这个问题(在一般情况下)被称为生日悖论
一般情况下的碰撞可以很容易地计算。但是,在您的特定情况下,您必须实际计算(并存储!)每个 MD5。
编辑@Scott:不是真的。鸽洞原理(只是生日问题的一个特例)会说,有 2^128 个可能的 MD5 值,我们肯定会在 1 + 2^128 次尝试后发生碰撞。生日悖论表明,对于大约 2^70 MD5 值,碰撞概率将大于 0.5。
通过这些对存储需求的估计,您可以决定该问题是否值得。对我来说,事实并非如此。
This problem (in generic case) is known as Birthday Paradox
The probability of collision in generic case can be computed easily. However, in your particular case, you have to actually compute (and store!) each MD5.
EDIT @Scott : not really. The Pigeonhole principle (being just a particular case of Birthday problem) would say that having 2^128 possible MD5 values, we surely will have a collision after 1 + 2^128 tries. The birthday paradox says that the probability of collision will be grater than 0.5 for about 2^70 MD5 values.
With these estimates for storage requirements, it's up to you to decide if the problem worth it. By me it does not.
显然,人们可以以此为基础写一篇论文(或者类似的问题,无论如何)。我还没有读过它,但也许史蒂文斯论文中的某些内容会对您有所帮助(显然是从维基百科文章链接到的)。
Apparently, one can base a thesis on this very thing (or similar problems, anyway). I haven't read it, but maybe something in Stevens' thesis will help you (it's apparently linked from the Wikipedia article).
在完美的世界中,为
1 + 2^128
。但我怀疑 md5 是否完美,我无法给你一个数字,但保证是<= 1+ 2^128
In a perfect world, to
1 + 2^128
. But I doubt md5 is perfect, I cant give you a number but is guaranteed to be<= 1+ 2^128
这是一种科学的方法来估算您需要数的高度。
将 MD5 哈希值缩减为 4 位。计算(确保计算直到达到 100 次碰撞,以便获得良好的平均值)
然后以 8 位进行相同的操作(再次,等待多次碰撞,以便计算平均值)。
如此反复进行,直到获得 4、8、12、16 位的平均值,然后看看是否可以找到趋势。遵循这一趋势,直至 128 位
您可能需要对所有 128 位进行异或以得到较短的版本。参加第一部分或最后一部分可能不是最好的测试。
Here is a scientific way to find out an estimate of how high you would have to count.
Make MD5 hash that is cut down to say 4 bits. Calculate that (make sure you calculate until you reach say 100 collisions so you get a good average)
Then make the same thing at 8 bits (again, wait for many collisions so you can calculate an average).
Do it again and again until you have averages for 4, 8, 12, 16 bits and then see if you can find a trend. Follow that trend up to 128 bits
You may want to xor all 128 bits to come up with your shorter version. Taking the first or last part may not be the best test.