当前位置：文江博客话题详情

导致 MD5 冲突的最短字符串对是什么？

发布于 2024-08-16 22:16:44 字数 176 浏览 11 评论 0原文

可以使用 MD5 作为哈希值，而不必担心冲突的可能性，最大字符串长度是多少？

这可能是通过为特定字符集中的每个可能的字符串生成 MD5 哈希来计算的，长度不断增加，直到哈希第二次出现（冲突）。没有冲突的字符串的最大可能长度将比冲突对中最长的一个字符少一个字符。

是否已针对 MD5、SHA1 等进行过测试？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

芯好空 2024-08-23 22:16:44

更新

讽刺的是，在我发布之前的答案几周后，两位中国研究人员谢涛和冯登国发表了MD5 的新单块碰撞。直到现在我才知道那篇论文。单个 MD5 块意味着输入大小为 64 字节或 512 位。请注意，输入大部分相同，仅存在 2 位差异。

他们的方法要到 2013 年 1 月才会发布，但现在可以使用论文中的数字来验证他们的冲突：

>>> from array import array
>>> from hashlib import md5
>>> input1 = array('I',  [0x6165300e,0x87a79a55,0xf7c60bd0,0x34febd0b,0x6503cf04,
    0x854f709e,0xfb0fc034,0x874c9c65,0x2f94cc40,0x15a12deb,0x5c15f4a3,0x490786bb,
    0x6d658673,0xa4341f7d,0x8fd75920,0xefd18d5a])
>>> input2 = array('I', [x^y for x,y in zip(input1,
    [0, 0, 0, 0, 0, 1<<10, 0, 0, 0, 0, 1<<31, 0, 0, 0, 0, 0])])
>>> input1 == input2
False
>>> md5(input1).hexdigest()
'cee9a457e790cf20d4bdaa6d69f01e41'
>>> md5(input2).hexdigest()
'cee9a457e790cf20d4bdaa6d69f01e41'

更新： 该论文已于 2013 年 3 月发布：Tao Xie 和 Fanbao Liu 和 Dengguo Feng - MD5 的快速碰撞攻击

但是，如果你有更多的空间可以玩，a 的碰撞几千字节的计算速度要快得多——在任何普通计算机上都可以在几小时内计算出它们。

旧答案

之前最短的冲突至少使用了两个 MD5 块的输入——即 128 字节，1024 位。第一个块中的前缀可以由攻击者任意选择，其余的将被计算并显示为乱码。

下面是两个不同的冲突输入的示例，您可以在 Python 中自行尝试：

>>> from binascii import unhexlify
>>> from hashlib import md5
>>> input1 = 'Oded Goldreich\nOded Goldreich\nOded Goldreich\nOded Go' + unhexlify(
... 'd8050d0019bb9318924caa96dce35cb835b349e144e98c50c22cf461244a4064bf1afaecc582'
... '0d428ad38d6bec89a5ad51e29063dd79b16cf67c12978647f5af123de3acf844085cd025b956')
>>> len(input1)
128
>>> md5(input1).hexdigest()
'd320b6433d8ebc1ac65711705721c2e1'
>>> input2 = 'Neal Koblitz\nNeal Koblitz\nNeal Koblitz\nNeal Koblitz\n' + unhexlify(
... '75b80e0035f3d2c909af1baddce35cb835b349e144e88c50c22cf461244a40e4bf1afaecc582'
... '0d428ad38d6bec89a5ad51e29063dd79b16cf6fc11978647f5af123de3acf84408dcd025b956')
>>> md5(input2).hexdigest()
'd320b6433d8ebc1ac65711705721c2e1'

在 215 个节点的 Playstation 3 集群上生成这两个特定输入花了 2 天，马克·史蒂文斯 :)

Update

Ironically, a few weeks after I posted the previous answer, two Chinese researchers, Tao Xie and Dengguo Feng, published a new single-block collision for MD5. I was unaware of that paper until now. A single MD5 block means that the input size is 64 bytes or 512 bits. Note that the inputs are mostly the same, differing only in 2 bits.

Their methodology won't be published until January 2013, but their collision can be verified now, using numbers from the paper:

>>> from array import array
>>> from hashlib import md5
>>> input1 = array('I',  [0x6165300e,0x87a79a55,0xf7c60bd0,0x34febd0b,0x6503cf04,
    0x854f709e,0xfb0fc034,0x874c9c65,0x2f94cc40,0x15a12deb,0x5c15f4a3,0x490786bb,
    0x6d658673,0xa4341f7d,0x8fd75920,0xefd18d5a])
>>> input2 = array('I', [x^y for x,y in zip(input1,
    [0, 0, 0, 0, 0, 1<<10, 0, 0, 0, 0, 1<<31, 0, 0, 0, 0, 0])])
>>> input1 == input2
False
>>> md5(input1).hexdigest()
'cee9a457e790cf20d4bdaa6d69f01e41'
>>> md5(input2).hexdigest()
'cee9a457e790cf20d4bdaa6d69f01e41'

Update: The paper has been published in March 2013: Tao Xie and Fanbao Liu and Dengguo Feng - Fast Collision Attack on MD5

However, if you have more room to play with, collisions of a few kilobytes are MUCH faster to calculate -- they can be calculated within hours on ANY regular computer.

Old answer

The previous shortest collision used at least two MD5 blocks worth of input -- that's 128 bytes, 1024 bits. A prefix in the first block can be chosen arbitrarily by the attacker, the rest would be computed and appear as gibberish.

Here's an example of two different colliding inputs, you can try it yourself in Python:

>>> from binascii import unhexlify
>>> from hashlib import md5
>>> input1 = 'Oded Goldreich\nOded Goldreich\nOded Goldreich\nOded Go' + unhexlify(
... 'd8050d0019bb9318924caa96dce35cb835b349e144e98c50c22cf461244a4064bf1afaecc582'
... '0d428ad38d6bec89a5ad51e29063dd79b16cf67c12978647f5af123de3acf844085cd025b956')
>>> len(input1)
128
>>> md5(input1).hexdigest()
'd320b6433d8ebc1ac65711705721c2e1'
>>> input2 = 'Neal Koblitz\nNeal Koblitz\nNeal Koblitz\nNeal Koblitz\n' + unhexlify(
... '75b80e0035f3d2c909af1baddce35cb835b349e144e88c50c22cf461244a40e4bf1afaecc582'
... '0d428ad38d6bec89a5ad51e29063dd79b16cf6fc11978647f5af123de3acf84408dcd025b956')
>>> md5(input2).hexdigest()
'd320b6433d8ebc1ac65711705721c2e1'

Generating these two particular inputs took 2 days on a 215-node Playstation 3 cluster, by Mark Stevens :)

回复收藏 0 原文