使用像这样的 MD5 哈希值的子字符串是否足够唯一?
我想做的是为我网站上的文章创建一个 12 个字符的 id,类似于 youtube 处理其视频 id 的方式 (http://www.youtube.com/watch?v=53iddd5IcSU)。现在我正在生成一个 MD5 散列,然后像这样抓取其中的 12 个字符:
$ArticleId = substr(MD5("Article".$currentID),10,12)
其中 $currentID 是数据库中的数字 ID(例如 144)
我有点偏执,我会遇到重复的 $ArticleId,但实际上是什么这种情况发生的可能性有多大?而且,由于我的数据库中的列是唯一的,我如何处理这种罕见的情况而不引发丑陋的错误?
PS 我制作了一个小脚本来检查前 5000 个 $ArticleId 中是否有重复项,但没有。
编辑:我不喜欢 base64_encode 哈希的外观,所以我这样做了:
function retryAID($currentID)
{
$AID = substr(MD5("Article".$currentID*2),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setLID) or retryAID($currentID);
}
$AID = substr(MD5("Article".$currentID),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);
由于 AID 列是唯一的,mysql_query 将抛出错误,并且 retryAID 函数将找到唯一的 id...
What I am trying to do is create a 12 character id for articles on my website similar to how youtube handles their video id (http://www.youtube.com/watch?v=53iddd5IcSU). Right now I am generating an MD5 hash and then grabbing 12 characters of it like this:
$ArticleId = substr(MD5("Article".$currentID),10,12)
where $currentID is the numeric ID from the database (eg 144)
I am slightly paranoid that I will run into a duplicate $ArticleId, but realistically what are the chances that this will happen? And also, being that the column in my database is unique, how can I handle this rare scenario without having an ugly error thrown?
P.S. I made a small script to check for duplicates within the first 5000 $ArticleId's and there were none.
EDIT: I don't like the way the base64_encode hashes look so I did this:
function retryAID($currentID)
{
$AID = substr(MD5("Article".$currentID*2),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setLID) or retryAID($currentID);
}
$AID = substr(MD5("Article".$currentID),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);
Since the AID column is unique the mysql_query will throw an error and the retryAID function will find a unique id...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用顺序 id 有什么问题?数据库将为您处理这个问题。
除此之外,12 个字符仍然是 96 位。 296 = 79228162514264337593543950336 个可能的哈希值。尽管众所周知 MD5 存在碰撞漏洞,但发生碰撞的可能性与实际看到碰撞的概率之间存在天壤之别。
更新:
基于您正在使用的 PHP md5 函数的返回值,我上面的数字不太正确。
由于您从 32 字符的十六进制数中取出 12 个字符(而不是 128 位哈希值的 12 个字节),因此最终可能得到的实际哈希数为 1612 = 281474976710656还是有不少的。
What's wrong with using a sequential id? The database will handle this for you.
That aside, 12 characters is still 96 bits. 296 = 79228162514264337593543950336 possible hashes. Even though MD5 is known to have collision vulnerabilities, there's a world of difference between the possibility of a collision and the probability of actually seeing one.
Update:
Based on the return value of the PHP md5 function you're using, my numbers above aren't quite right.
Since you're taking 12 characters from a 32-character hexadecimal number (and not 12 bytes of the 128-bit hash), the actual number of possible hashes you could end up with is 1612 = 281474976710656. Still quite a few.
返回以 36 为基数的 12 个字符数字,提供 4,738,381,338,321,616,896 种可能性。 (碰撞的概率取决于随机数生成器的分布。)
为了确保没有碰撞,您需要循环:
Returns a 12 character number in base-36, which gives 4,738,381,338,321,616,896 possibilities. (The probability of collision depends on the distribution of the random number generator.)
To ensure no collisions, you'll need to loop:
不,不是很独特。
如果您需要更短的长度,为什么不对其进行 base64 编码呢?
No not very unique.
Why not base64 encode it if you need it shorter?
UUID 怎么样?
http://php.net/manual/en/function.uniqid.php
How about UUID ?
http://php.net/manual/en/function.uniqid.php