哈希片段是否具有抗碰撞性?
如果仅使用 MD5 哈希值的前 4 个字节,这是否意味着理论上只有 255^4 分之一的碰撞机会?也就是说,哈希值的设计是否使得您只需使用返回哈希值的一小部分(假设哈希值是某个大小的文件)?
If you only use the first 4 bytes of an MD5 hash, would that mean theoretically only 1 in 255^4 chance of collision? That is, are hashes designed such that you only have to use a small portion of the returned hash (say the hash is of a file of some size)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
请记住,即使不考虑聪明的攻击者故意试图引起冲突,一旦您要散列的对象数量达到平方根,您就需要开始担心意外冲突。 em> 的哈希空间...对于 32 位哈希密钥来说只有几万个对象。这来自所谓的生日悖论。
Remember that, even without considering a smart attacker deliberately trying to cause collisions, you need to start worrying about accidental collisions once the number of objects you're hashing get comparable to the square root of the hash space... just a few tens of thousands of objects for a 32-bit hash key. This comes from the so-called birthday paradox.
它是 256,而不是 255。
假设 MD5 是一个安全哈希函数(事实证明它不安全,但是为了讨论方便,我们假设它是安全的),那么它应该表现得像一个随机预言机,一个输出均匀随机值的神话对象,唯一的约束是它“记住”其先前的输出并在给定相同的输入的情况下再次返回相同的值。
截断随机预言的输出会产生另一个随机预言。因此,如果保留 32 位,则两个不同输入消息发生冲突的概率为 2^32 分之一(即 256^4 分之一)。
现在有一个被称为生日悖论的事情,它说,大约有 2^16 个不同的输入,很有可能 2^16 个相应输出中的两个发生冲突。
MD5 已被证明对于某些用途来说是不安全的——特别是与冲突相关的任何用途。当前的默认建议是 SHA-2 (四个函数的家族,具有输出大小分别为 224、256、384 和 512 位)。目前正在通过公开竞争定义一个新的(美国)标准,代号为 SHA-3。这是一个漫长的过程;新职能将于2012年中期选定。其余的一些候选者(最初的 51 个中目前有 14 个)比 SHA-2 快得多,有些在性能上接近 MD5,同时安全性要高得多。但这有点新,所以现在您应该默认使用 SHA-2。
It is 256, not 255.
Assuming that MD5 is a secure hash function (it turns out it is not secure, but, for the sake of the discussion, let's suppose that it is secure), then it should behave like a random oracle, a mythical object which outputs uniformly random values, under the sole constraint that it "remembers" its previous outputs and returns the same value again, given the same input.
Truncating the output of a random oracle yields another random oracle. Thus, if you keep 32 bits, then the probability of a collision with two distinct input messages is 1 in 2^32 (i.e. 1 in 256^4).
Now there is a thing known as the birthday paradox which says that, with about 2^16 distinct inputs, there are good chances that two of the 2^16 corresponding outputs collide.
MD5 has been shown to be insecure for some purposes -- in particular anything which is related to collisions. The current default recommendation is SHA-2 (a family of four functions, with output sizes 224, 256, 384 and 512 bits, respectively). A new (american) standard is currently being defined, through an open competition, under the code name SHA-3. This is a long process; the new function shall be chosen by mid-2012. Some of the remaining candidates (currently 14, out of an initial 51) are substantially faster than SHA-2, some approaching MD5 in performance, while being considerably more secure. But this is a bit new, so right now you shall use SHA-2 by default.
假设我们有一条预先确定的消息1。 hash1 = md5(message1)
现在随机选择一个message2,并设置hash2 = md5(message2)。
理论上,hash2 的前四个字符与预先确定的 hash1 的前四个字符匹配的概率为 1/255^4。
对于知道 message1 的攻击者来说,想出具有相同哈希值的不同 message2 也应该是非常困难的。这称为第二原像抵抗。然而,即使使用完整的 MD5,也存在比理论上更好的原像攻击。
MD5 对于冲突完全损坏。这意味着攻击者(在几个小时内)完全有可能得出具有相同哈希值的两条消息(更不用说相同的前四个字节)。攻击者可以选择这两条消息,但这仍然可能造成重大损害。例如,请参见中毒消息示例。
Assume we have a pre-determined message1. hash1 = md5(message1)
Now choose a message2 randomly, and set hash2 = md5(message2).
In theory there is a 1/255^4 chance that the first four characters of hash2 match the first four of pre-determined hash1.
It is also supposed to be very hard for an attacker that knows message1 to come up with a different message2 that has the same hash. This is called second pre-image resistance. However, even with the full MD5, there are better than theoretical pre-image attacks.
MD5 is completely broken for collisions. This means it is quite feasible for an attacker (in a few hours) to come up with two messages with the same hash (let alone the same first four bytes). The attacker gets to choose both messages, but this can still cause major damage. See for instance the poisoned message example.
如果您要生成唯一标识符,则可能需要使用 UUID 代替。这些旨在最大限度地减少碰撞的变化,以便在实践中永远不会发生碰撞。
如果您担心文件名太长(当大多数操作系统支持长达 255 个字符的名称时,这是一个需要担心的特殊问题),您始终可以将文件名拆分为路径和文件名组件。这样做的优点是可以将文件分割到不同的目录中:
If you're generating unique identifiers, you might want to use a UUID instead. These are designed to minimize the change of collisions so that in practice they should never occur.
If you're worried about filenames being too long, which is a peculiar thing to be concerned about when most operating systems support names as long as 255 characters, you can always split the filename into a path and filename component. This has the advantage of splitting up the files into different directories:
取决于哈希的目的。
哈希表中使用的哈希函数在较低位(用于查找数组索引)中往往比在较高位中具有更多的“随机性”。校验和和加密哈希函数分布更均匀。
Depends on the purpose of the hash.
Hash functions for use in hash tables tend to have more "randomness" in the lower bits (which are used to find the array index) than in the higher bits. Checksum and cryptographic hash functions are more evenly distributed.