子字符串 md5 碰撞
我需要一个 4 字符的哈希值。目前我正在获取 md5()
哈希值的前 4 个字符。我正在对长度不超过 80 个字符的字符串进行哈希处理。这会导致碰撞吗?或者,假设我将散列少于 65,536 (164) 个不同元素,碰撞的可能性是多少?
I need a 4-character hash. At the moment I am taking the first 4 characters of a md5()
hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (164) different elements?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
嗯,
md5
的每个字符都是一个十六进制位。这意味着它可以具有 16 个可能值之一。因此,如果您仅使用前 4 个“十六进制位”,则意味着您可以拥有16 * 16 * 16 * 16
或16^4
或 65536 或 <代码>2^16 可能性。因此,这意味着结果的总可用“空间”只有 16 位宽。现在,根据生日攻击/问题,发生碰撞的可能性如下:
50%
机会 ->300
条目1%
机会 ->36
条目0.0000001%
机会 ->2
条目。所以发生碰撞的可能性相当大。
现在,您说您需要 4 个字符的哈希值。根据具体要求,您可以执行以下操作:
16^4
(65,536) 个可能值26^4
(456,976) 个可能值36^4
的字母数字位 (1,679,616) 可能值93^4
(74,805,201) 可能值的 ascii 可打印位(假设 ASCII 33 -> 126)256^4
(4,294,967,296) 个可能值的完整字节。现在,您选择哪个将取决于实际用例。哈希值需要传输到浏览器吗?你如何存储它,等等。
我将给出每个示例(在 PHP 中,但应该很容易翻译/看看发生了什么):
4 十六进制位:
4 Alpha位:
4 个字母数字位:
4 个可打印 Assci 位:
4 个完整字节:
Well, each character of
md5
is a hex bit. That means it can have one of 16 possible values. So if you're only using the first 4 "hex-bits", that means you can have16 * 16 * 16 * 16
or16^4
or 65536 or2^16
possibilities.So, that means that the total available "space" for results is only 16 bits wide. Now, according to the Birthday Attack/Problem, there are the following chances for collision:
50%
chance ->300
entries1%
chance ->36
entries0.0000001%
chance ->2
entries.So there is quite a high chance for collisions.
Now, you say you need a 4 character hash. Depending on the exact requirements, you can do:
16^4
(65,536) possible values26^4
(456,976) possible values36^4
(1,679,616) possible values93^4
(74,805,201) possible values (assuming ASCII 33 -> 126)256^4
(4,294,967,296) possible values.Now, which you choose will depend on the actual use case. Does the hash need to be transmitted to a browser? How are you storing it, etc.
I'll give an example of each (In PHP, but should be easy to translate / see what's going on):
4 Hex-Bits:
4 Alpha bits:
4 Alpha Numeric bits:
4 Printable Assci Bits:
4 full bytes:
确实高得惊人。 正如您从 这张近似碰撞概率图(来自 wikipedia 的公式page),只有几百个元素,发生冲突的可能性就超过 50%。
当然,请注意,如果您面临攻击者提供字符串的可能性,您可能可以假设它是 100% - 在 16 位搜索空间中扫描以查找冲突几乎可以在任何现代 PC 上立即完成。甚至任何现代手机都可以。
Surprisingly high indeed. As you can see from this graph of an approximate collision probability (formula from the wikipedia page), with just a few hundred elements your probability of having a collision is over 50%.
Note, of course, if you're facing the possibility of an attacker providing the string, you can probably assume that it's 100% - scanning to find a collision in a 16-bit search space can be done almost instantaneously on any modern PC. Or even any modern cell phone, for that matter.
前4个字符包含4*4 = 16位数据,因此碰撞肯定会在65536个元素处,并且由于生日攻击,会更快地被发现。您应该使用更多的哈希值。
4 first characters contains 4*4 = 16 bits of data, so collision will be definitely at 65536 elements, and, due to birthday attack, it will be found much faster. You should use more bits of hash.