截断的md5均匀分布?
我们可以说截断的 md5
哈希值仍然是均匀分布的吗?
为了避免误解:我知道当您开始从 md5
结果中删除部分时,发生冲突的可能性会更大;我的用例实际上对故意碰撞感兴趣。我还知道还有 其他 散列方法可能更适合较短散列的用例(实际上包括我自己的散列),并且我'我肯定在研究 那些。
但我也很想知道 md5 的均匀分布是否也适用于它的块。 (将其视为一种强烈的好奇心。)
由于 mediawiki 使用它(特别是最左边的两个十六进制数字作为结果的字符)来生成图像的文件路径(例如 /4/42/The-image-name- here.png
),他们可能也对至少接近均匀的分布感兴趣,我想答案是“是”,但我实际上不知道 。
Can we say that a truncated md5
hash is still uniformly distributed?
To avoid misinterpretations: I'm aware the chance of collisions is much greater the moment you start to hack off parts from the md5
result; my use-case is actually interested in deliberate collisions. I'm also aware there are other hash methods that may be better suited to use-cases of a shorter hash (including, in fact, my own), and I'm definitely looking into those.
But I'd also really like to know whether md5
's uniform distribution also applies to chunks of it. (Consider it a burning curiosity.)
Since mediawiki uses it (specifically, the left-most two hex-digits as characters of the result) to generate filepaths for images (e.g. /4/42/The-image-name-here.png
) and they're probably also interested in an at least near-uniform distribution, I imagine the answer is 'yes', but I don't actually know.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,不表现出任何偏见是加密哈希的设计要求。从密码学的角度来看,MD5 已被破坏,但结果的分布从未受到质疑。
如果您仍然需要确信,散列一堆文件、截断输出并使用 ent ( http://www.fourmilab.ch/random/)来分析结果。
Yes, not exhibiting any bias is a design requirement for a cryptographic hash. MD5 is broken from a cryptographic point of view however the distribution of the results was never in question.
If you still need to be convinced, it's not a huge undertaking to hash a bunch of files, truncate the output and use ent ( http://www.fourmilab.ch/random/ ) to analyze the result.
我写了一个小 php 程序来回答这个问题。它不是很科学,但它使用自然数作为哈希文本显示了哈希值的前 8 位和后 8 位的分布。经过大约 40.000.000 次哈希后,最高计数和最低计数之间的差异下降到 1%,所以我认为分布是可以的。我希望代码能够更准确地解释计算的内容:-)
顺便说一句,通过类似的程序,我发现最后 8 位的分布似乎比第一个稍好。
I wrote a little php-program to answer this question. It's not very scientific, but it shows the distribution for the first and the last 8 bits of the hashvalues using the natural numbers as hashtext. After about 40.000.000 hashes the difference between the highest and the lowest counts goes down to 1%, so I'd say the distribution is ok. I hope the code is more precise in explaining what was computed :-)
Btw, with a similar program I found that the last 8 bits seem to be distributed slightly better than the first.