GUID 比哈希用户 ID 更短?

发布于 2024-10-11 16:12:09 字数 310 浏览 7 评论 0原文

我想知道 Instapaper(保存文本的小书签)如何为其小书签生成 URL。

我的脚本 src 类似于 www.instapaper.com/j/AnJHrfoDTRia

这些 URL 的质量是它们永远不会发生冲突,并且不会真正被猜测(因此其他人无法猜测)保存到您的帐户)。

我知道一个简单的方法可能是对他们的电子邮件地址进行 MD5(假定在注册时检查了唯一性),但最终我会得到一个超长的字符串。这不是一个大问题,但我想知道有什么技术可以让较短的 GUID 不会经常发生冲突(这显然是权衡,但在我看来,上面的 12 个字符相当短)

I'm wondering how Instapaper (bookmarklet that saves text) might generate URLs for their bookmarklet.

Mine has a script src of something similar to www.instapaper.com/j/AnJHrfoDTRia

The quality of these URLs is that they need to never collide, and not be really guessable (so other people can't save to your account).

I know a simple approach might be to MD5 their email address (presumed to have been checked on signup for uniqueness), but then I'd end up with a super long string. This isn't a huge issue, but I'm wondering what techniques there are for shorter GUIDs that won't collide too often (this is obviously the tradeoff, but 12 characters above is pretty short in my opinion)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

孤凫 2024-10-18 16:12:09

您可以通过将 MD5 哈希视为基数 16 的数字(使用字符(0-9a-f)并将其转换为基数 36 来获得更短的字符串。

<?php
function gmp_convert($num, $base_a, $base_b) {
    return gmp_strval (gmp_init($num, $base_a), $base_b );
}

$hash = md5("hello");
$hash2 = gmp_convert($hash,16,36);
echo "$hash <br>"; //5d41402abc4b2a76b9719d911017c592 
echo $hash2; //5ir3t0ozoelrnauhrwyu1xfgy

您提到的链接似乎使用了所有字母(上方和小写)。

这些问答中提取的信息。

You can get a shorter string by treating the MD5 hash as a number in base 16 (that uses characters(0-9a-f) and converting it to for example base 36.

<?php
function gmp_convert($num, $base_a, $base_b) {
    return gmp_strval (gmp_init($num, $base_a), $base_b );
}

$hash = md5("hello");
$hash2 = gmp_convert($hash,16,36);
echo "$hash <br>"; //5d41402abc4b2a76b9719d911017c592 
echo $hash2; //5ir3t0ozoelrnauhrwyu1xfgy

The link you mention seems to be using all the letters (upper and lowercase).

Information extracted from these Q&As

断桥再见 2024-10-18 16:12:09
<?php

$length = 12;

$chars = array_merge(range(0, 9), range('a', 'z'), range('A', 'Z'));

$hash = '';

for ($i = 0; $i < $length; $i++) {
    $hash .= $chars[array_rand($chars)];
}

var_dump($hash);

这将为我们提供 3226266762397899821056 个独特组合,而 md5 则为 281474976710656 个(大 1100 万倍)。

只需 4 个字符(!!!),就有 14776336 个独特的组合,这对您来说已经足够了。

<?php

$length = 12;

$chars = array_merge(range(0, 9), range('a', 'z'), range('A', 'Z'));

$hash = '';

for ($i = 0; $i < $length; $i++) {
    $hash .= $chars[array_rand($chars)];
}

var_dump($hash);

This will give us 3226266762397899821056 unique combinations vs 281474976710656 for md5 (which is 11 million times bigger).

For just 4 chars (!!!) it will be 14776336 unique combinations, which can be enough for you.

嘴硬脾气大 2024-10-18 16:12:09

Base64 对一组加密的强随机数进行编码。

<?php
// get 72 pseudorandom bits in a base64 string of 12 characters

$pr_bits = '';

// Unix/Linux platform?
$fp = @fopen('/dev/urandom','rb');
if ($fp !== FALSE) {
    $pr_bits .= @fread($fp,9);
    @fclose($fp);
}

// MS-Windows platform?
if (@class_exists('COM')) {
    // http://msdn.microsoft.com/en-us/library/aa388176(VS.85).aspx
    try {
        $CAPI_Util = new COM('CAPICOM.Utilities.1');
        $pr_bits .= $CAPI_Util->GetRandom(9,0);

        // if we ask for binary data PHP munges it, so we
        // request base64 return value.  We squeeze out the
        // redundancy and useless ==CRLF by hashing...
        if ($pr_bits) { $pr_bits = substr(md5($pr_bits,TRUE), 0, 9); }
    } catch (Exception $ex) {
        // echo 'Exception: ' . $ex->getMessage();
    }
}

$uid = base64_encode($pr_bits);
?>

这将为您提供 12 个字符中的 72 位最纯粹的哥伦布风格。该集合包含大约 10^21 个数字。这意味着100万用户之后发生碰撞的几率约为十亿分之一。

这是对这个 stackoverflow 答案的一个非常小的修改,用于生成加密货币:Secure random number Generation in PHP

Base64 encode a cryptographically strong set of random numbers.

<?php
// get 72 pseudorandom bits in a base64 string of 12 characters

$pr_bits = '';

// Unix/Linux platform?
$fp = @fopen('/dev/urandom','rb');
if ($fp !== FALSE) {
    $pr_bits .= @fread($fp,9);
    @fclose($fp);
}

// MS-Windows platform?
if (@class_exists('COM')) {
    // http://msdn.microsoft.com/en-us/library/aa388176(VS.85).aspx
    try {
        $CAPI_Util = new COM('CAPICOM.Utilities.1');
        $pr_bits .= $CAPI_Util->GetRandom(9,0);

        // if we ask for binary data PHP munges it, so we
        // request base64 return value.  We squeeze out the
        // redundancy and useless ==CRLF by hashing...
        if ($pr_bits) { $pr_bits = substr(md5($pr_bits,TRUE), 0, 9); }
    } catch (Exception $ex) {
        // echo 'Exception: ' . $ex->getMessage();
    }
}

$uid = base64_encode($pr_bits);
?>

This will give you 72 bits of the purest Columbian in 12 characters. This set contains roughly 10^21 numbers. This means that the chance of collision is about 1 in a billion after 1 million users.

This is a very slight modification of this stackoverflow answer for generating crypto awesomeness: Secure random number generation in PHP.

尐籹人 2024-10-18 16:12:09

MD5 用户名。获取生成的 MD5 哈希值的前 X 个字符。检查数据库中是否已存在具有该值的 url 令牌。如果是这样,请取出前 X+1 个字符并尝试(依此类推)。如果没有,那么您就拥有该用户的令牌。将令牌存储在数据库中并从现在开始在那里查找它 - 不要尝试每次都从用户名重新创建令牌或诸如此类。

您可能可以从 X=7 开始并且做得很好(对于绝大多数代币生成,尝试次数不超过 1-2 次)。

此外,您可能希望在哈希计算中添加其他内容(例如,它们的数字或随机数),只是为了使预测给定用户的令牌变得更加困难。

MD5 the username. Take the first X characters of the resulting MD5 hash. Check to see if there is already a url token with that value in the DB. If so, take the first X+1 characters and try that (and so on). If not, then you have your token for that user. Store the token in the DB and look it up there from now on - don't try to re-create the token from the username each time or whatnot.

You could probably start with X=7 and do fine (no more than 1-2 tries for the vast majority of token generations).

Also, you may want to add something else into the hash calculation (say, their or a random number) just to make it harder to predict a given user's token.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文