截断的md5均匀分布？

发布于 2024-12-16 12:30:00 字数 571 浏览 1 评论 0原文

我们可以说截断的 md5 哈希值仍然是均匀分布的吗？

为了避免误解：我知道当您开始从 md5 结果中删除部分时，发生冲突的可能性会更大；我的用例实际上对故意碰撞感兴趣。我还知道还有其他散列方法可能更适合较短散列的用例（实际上包括我自己的散列），并且我'我肯定在研究那些。

但我也很想知道 md5 的均匀分布是否也适用于它的块。（将其视为一种强烈的好奇心。）

由于 mediawiki 使用它（特别是最左边的两个十六进制数字作为结果的字符）来生成图像的文件路径（例如 /4/42/The-image-name- here.png），他们可能也对至少接近均匀的分布感兴趣，我想答案是“是”，但我实际上不知道。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兔姬 2024-12-23 12:30:00

是的，不表现出任何偏见是加密哈希的设计要求。从密码学的角度来看，MD5 已被破坏，但结果的分布从未受到质疑。

如果您仍然需要确信，散列一堆文件、截断输出并使用 ent ( http://www.fourmilab.ch/random/）来分析结果。

回复收藏 0 原文

凌乱心跳 2024-12-23 12:30:00

我写了一个小 php 程序来回答这个问题。它不是很科学，但它使用自然数作为哈希文本显示了哈希值的前 8 位和后 8 位的分布。经过大约 40.000.000 次哈希后，最高计数和最低计数之间的差异下降到 1%，所以我认为分布是可以的。我希望代码能够更准确地解释计算的内容:-)
顺便说一句，通过类似的程序，我发现最后 8 位的分布似乎比第一个稍好。

<?php
// Setup count-array:
for ($y=0; $y<16; $y++) {
  for ($x=0; $x<16; $x++) {
    $count[dechex($x).dechex($y)] = 0;
  }
}

$text = 1; // The text we will hash.
$hashCount = 0;
$steps = 10000;

while (1) {
  // Calculate & count a bunch of hashes:
  for ($i=0; $i<$steps; $i++) {   
    $hash = md5($text);
    $count[substr($hash, 0, 2)]++;
    $count[substr($hash, -2)]++;
    $text++;
  }
  $hashCount += $steps;

  // Output result so far:
  system("clear");
  $min = PHP_INT_MAX; $max = 0;
  for ($y=0; $y<16; $y++) {
    for ($x=0; $x<16; $x++) {  
      $n = $count[dechex($x).dechex($y)];
      if ($n < $min) $min = $n;
      if ($n > $max) $max = $n;
      print $n."\t";
    }
    print "\n";
  }
  print "Hashes: $hashCount, Min: $min, Max: $max, Delta: ".((($max-$min)*100)/$max)."%\n";
} 
?>

I wrote a little php-program to answer this question. It's not very scientific, but it shows the distribution for the first and the last 8 bits of the hashvalues using the natural numbers as hashtext. After about 40.000.000 hashes the difference between the highest and the lowest counts goes down to 1%, so I'd say the distribution is ok. I hope the code is more precise in explaining what was computed :-)
Btw, with a similar program I found that the last 8 bits seem to be distributed slightly better than the first.

<?php
// Setup count-array:
for ($y=0; $y<16; $y++) {
  for ($x=0; $x<16; $x++) {
    $count[dechex($x).dechex($y)] = 0;
  }
}

$text = 1; // The text we will hash.
$hashCount = 0;
$steps = 10000;

while (1) {
  // Calculate & count a bunch of hashes:
  for ($i=0; $i<$steps; $i++) {   
    $hash = md5($text);
    $count[substr($hash, 0, 2)]++;
    $count[substr($hash, -2)]++;
    $text++;
  }
  $hashCount += $steps;

  // Output result so far:
  system("clear");
  $min = PHP_INT_MAX; $max = 0;
  for ($y=0; $y<16; $y++) {
    for ($x=0; $x<16; $x++) {  
      $n = $count[dechex($x).dechex($y)];
      if ($n < $min) $min = $n;
      if ($n > $max) $max = $n;
      print $n."\t";
    }
    print "\n";
  }
  print "Hashes: $hashCount, Min: $min, Max: $max, Delta: ".((($max-$min)*100)/$max)."%\n";
} 
?>

回复收藏 0 原文

~没有更多了~