使用像这样的 MD5 哈希值的子字符串是否足够唯一?

发布于 2024-08-22 00:58:24 字数 1018 浏览 5 评论 0原文

我想做的是为我网站上的文章创建一个 12 个字符的 id,类似于 youtube 处理其视频 id 的方式 (http://www.youtube.com/watch?v=53iddd5IcSU)。现在我正在生成一个 MD5 散列,然后像这样抓取其中的 12 个字符:

$ArticleId = substr(MD5("Article".$currentID),10,12)

其中 $currentID 是数据库中的数字 ID(例如 144)

我有点偏执,我会遇到重复的 $ArticleId,但实际上是什么这种情况发生的可能性有多大?而且,由于我的数据库中的列是唯一的,我如何处理这种罕见的情况而不引发丑陋的错误?

PS 我制作了一个小脚本来检查前 5000 个 $ArticleId 中是否有重复项,但没有。

编辑:我不喜欢 base64_encode 哈希的外观,所以我这样做了:

function retryAID($currentID)
{
    $AID = substr(MD5("Article".$currentID*2),10,12);

    $setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
    mysql_query($setLID) or retryAID($currentID);
}


$AID = substr(MD5("Article".$currentID),10,12);

$setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);

由于 AID 列是唯一的,mysql_query 将抛出错误,并且 retryAID 函数将找到唯一的 id...

What I am trying to do is create a 12 character id for articles on my website similar to how youtube handles their video id (http://www.youtube.com/watch?v=53iddd5IcSU). Right now I am generating an MD5 hash and then grabbing 12 characters of it like this:

$ArticleId = substr(MD5("Article".$currentID),10,12)

where $currentID is the numeric ID from the database (eg 144)

I am slightly paranoid that I will run into a duplicate $ArticleId, but realistically what are the chances that this will happen? And also, being that the column in my database is unique, how can I handle this rare scenario without having an ugly error thrown?

P.S. I made a small script to check for duplicates within the first 5000 $ArticleId's and there were none.

EDIT: I don't like the way the base64_encode hashes look so I did this:

function retryAID($currentID)
{
    $AID = substr(MD5("Article".$currentID*2),10,12);

    $setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
    mysql_query($setLID) or retryAID($currentID);
}


$AID = substr(MD5("Article".$currentID),10,12);

$setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);

Since the AID column is unique the mysql_query will throw an error and the retryAID function will find a unique id...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

柏拉图鍀咏恒 2024-08-29 00:58:24

使用顺序 id 有什么问题?数据库将为您处理这个问题。

除此之外,12 个字符仍然是 96 位。 296 = 79228162514264337593543950336 个可能的哈希值。尽管众所周知 MD5 存在碰撞漏洞,但发生碰撞的可能性与实际看到碰撞的概率之间存在天壤之别。

更新:

基于您正在使用的 PHP md5 函数的返回值,我上面的数字不太正确。

以 32 个字符的十六进制数形式返回哈希值。

由于您从 32 字符的十六进制数中取出 12 个字符(而不是 128 位哈希值的 12 个字节),因此最终可能得到的实际哈希数为 1612 = 281474976710656还是有不少的。

What's wrong with using a sequential id? The database will handle this for you.

That aside, 12 characters is still 96 bits. 296 = 79228162514264337593543950336 possible hashes. Even though MD5 is known to have collision vulnerabilities, there's a world of difference between the possibility of a collision and the probability of actually seeing one.

Update:

Based on the return value of the PHP md5 function you're using, my numbers above aren't quite right.

Returns the hash as a 32-character hexadecimal number.

Since you're taking 12 characters from a 32-character hexadecimal number (and not 12 bytes of the 128-bit hash), the actual number of possible hashes you could end up with is 1612 = 281474976710656. Still quite a few.

新一帅帅 2024-08-29 00:58:24
<?php
  function get_id()
  {
    $max = 1679615; // pow(36, 4) - 1;
    $id = '';

    for ($i = 0; $i < 3; ++$i)
    {
      $r = mt_rand(0, $max);
      $id .= str_pad(base_convert($r, 10, 36), 4, "0", STR_PAD_LEFT);
    }
    return $id;
  }
?>

返回以 36 为基数的 12 个字符数字,提供 4,738,381,338,321,616,896 种可能性。 (碰撞的概率取决于随机数生成器的分布。)

为了确保没有碰撞,您需要循环:

<?php
do {
  $id = get_id();
} while ( !update_id($id) );
?>
<?php
  function get_id()
  {
    $max = 1679615; // pow(36, 4) - 1;
    $id = '';

    for ($i = 0; $i < 3; ++$i)
    {
      $r = mt_rand(0, $max);
      $id .= str_pad(base_convert($r, 10, 36), 4, "0", STR_PAD_LEFT);
    }
    return $id;
  }
?>

Returns a 12 character number in base-36, which gives 4,738,381,338,321,616,896 possibilities. (The probability of collision depends on the distribution of the random number generator.)

To ensure no collisions, you'll need to loop:

<?php
do {
  $id = get_id();
} while ( !update_id($id) );
?>
等风来 2024-08-29 00:58:24

不,不是很独特。

如果您需要更短的长度,为什么不对其进行 base64 编码呢?

No not very unique.

Why not base64 encode it if you need it shorter?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文