PHP使用auto_increment生成短唯一ID?

发布于 2024-08-09 00:28:09 字数 495 浏览 1 评论 0原文

我想生成一个简短的、唯一的 ID,而不必检查冲突。

我目前正在做类似的事情,但是我当前生成的 ID 是随机的,并且在循环中检查冲突很烦人,并且如果记录数量显着增加,将会变得昂贵。

通常担心冲突不是问题,但我想要生成的唯一 ID 是一个由 5-8 个字符组成的唯一短字符串,字母数字,就像tinyurl 一样。

编辑:我想从 5 个字符开始,如果我达到 6000 万个条目,则转到 6.. 依此类推。

为此,我想我可以使用对用户隐藏的 auto_increment 值,并使用 MD5 或其他方法来向用户显示,以从中生成唯一的字符串。

生成的字符串不应该看起来是线性的,因此简单地将自动增量 ID 转换为 base 36 [0-9A-Z] 有点过于简单,但类似的函数就是我想要的有了这个。

编辑:安全性不是问题,因为这不会用于保护信息。它只是更长字符串的快捷方式。 谢谢。

感谢您的建议,并对延迟表示歉意。牙医..

I would like to generate a short, unique ID without having to check for collisions.

I currently do something like this, but the ID I currently generate is random and checking for collisions in a loop is annoying and will get expensive if the number of records grows significantly.

Normally worrying about collisions isn't an issue, but the unique ID I want to generate is a short unique string 5-8 characters, alpha-numeric, like tinyurl does.

EDIT: I would like to start out with 5 characters and if I hit 60 million entries, then go to 6.. so on and so forth.

To this end, I was thinking I could use an auto_increment value that is hidden from the users, and present them instead with an MD5 or some other method to generate a unique string from that.

Generated strings should not appear to be linear, so simply converting the auto_incremented ID into base 36 [0-9A-Z] is a bit too simplistic, but a function something like that is where I'm going with this.

EDIT: Security is not an issue as this will not be used to secure information. It is simply a shortcut to a longer string.
Thank you.

Thank you for your suggestions and sorry for the delay. Dentist..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

遥远的她 2024-08-16 00:28:11

递增数字的 MD5 应该没问题,但我担心,如果您将 MD5(通常为 128 位)截断为 5-8 个字符,则几乎肯定会损害它作为唯一签名的能力。 ..

An MD5 of an incrementing number should be fine, but I worry that if you're truncating your MD5 (which is normally 128 bits) down to 5-8 characters, you will almost certainly be damaging it's capability to act as a unique signature...

ㄟ。诗瑗 2024-08-16 00:28:10

您可能可以生成当前日期时间/随机数的 MD5 哈希值,并将其截断为您需要的长度(5-8 个字符)并将其存储为 id 字段。

如果您使用将这些信息存储在数据库中,则不需要使用 for 循环来执行冲突检查,但您可以只执行一条 select 语句 - 类似于

SELECT count(1) c FROM Table WHERE id = :id

其中 :id 将是新生成的 id。如果 c 大于 0,那么您就知道它已经存在。

编辑

这可能不是最好的方法。但我会尝试一下,所以我想您需要的是以某种方式将数字转换为唯一的短字符串,并且不按顺序排列。

我想正如你所说,base64 编码已经完成了数字到短字符串的转换。为了避免序列问题,您可以在自动生成的 id 到某个“随机”值(唯一映射)之间进行一些映射。然后您可以对这个唯一值进行 Base64 编码。

您可以按如下方式生成此映射。有一个临时表存储 1 - 10,000,000 之间的值。按随机顺序对其进行排序并将其存储到您的地图表中。

INSERT INTO MappingTable (mappedId) SELECT values FROM TemporaryTable ORDER BY RAND()

其中 MappingTable 将具有 2 个字段 id(您自动生成的 id 将根据此查找)和mappedId(这是您将为其生成 base64 编码的内容)。

当您接近 10,000,000 时,您可以再次重新运行上述代码,并将临时表中的值更改为 10,000,001-20,000,000 或类似的值。

You could probably generate a MD5 hash of the current datetime/random number and truncate it to the length you need (5-8 characters) and store it as the id field.

If you are using storing this information in a database, you don't need to use a for loop to do the collision check, but you could just do a select statement - something like

SELECT count(1) c FROM Table WHERE id = :id

where :id would be the newly generated id. If c is greater than 0 then you know it already exists.

EDIT

This may may not be the best way to go about it. But I'll give it a shot, so I guess what you need is someway of converting a numbers into a unique short string and that is not in sequence.

I guess as you said, base64 encoding already does the number to short string conversion. To avoid the sequence problem you could have some mapping between your auto-generated id's to some "random" value (unique mapping). Then you can base64 encode this unique value.

You could generate this mapping as follows. Have a temporary table store values from 1 - 10,000,000. Sort it in random order and store it into you Map table.

INSERT INTO MappingTable (mappedId) SELECT values FROM TemporaryTable ORDER BY RAND()

Where MappingTable would have the 2 fields id (your auto-generated id would look up against this) and mappedId (which is what you would generate the base64 encoding for).

As you get closer to 10,000,000 you could rerun the above code again and change the values in the temporary table with 10,000,001-20,000,000 or something like that.

假面具 2024-08-16 00:28:10

您需要一些构造上正确的东西,即排列函数:这是一个将一个整数(顺序计数器)到另一个整数进行一对一、可逆映射的函数。
一些示例(这些的任何组合也应该有效):

  • 反转某些位(fi 使用 XOR,PHP 中的 ^)
  • 交换位的位置 (($i & 0xc) >> 2 | ($i & ; 0x3) << 2),或者只是反转所有位的顺序
  • 加上以最大范围为模的常数值(如果将其与上面的值结合起来,则必须是两倍)

示例:此函数将将 0, 1, 2, 3, 5, .. 转换为 13, 4, 12, 7, 15, .. 对于 15 以内的数字:

$i=($input+97) & 0xf;
$result=((($i&0x1) << 3) + (($i&0xe) >> 1)) ^ 0x5;

编辑

一种更简单的方法是使用线性同余生成器(LCG,通常用于生成随机数),由以下形式的公式定义:

X_n+1 = (a * X_n + c) mod m

For a、c 和 m 的良好值,X_0、X_1 .. X_m-1 的序列将恰好包含 0 到 m-1 之间的所有数字一次。现在,您可以从线性递增的索引开始,并使用 LCG 序列中的下一个值作为您的“秘密”密钥。

EDIT2

实施:
您可以设计自己的LCG参数,但如果您弄错了,它不会涵盖全范围(因此有重复项),因此我将使用来自 的已发布并尝试过的参数集本文

a = 16807, c = 0, m = 2147483647

这为您提供了 2**31 的范围。使用 pack(),您可以将结果整数作为字符串获取,base64_encode() 使其成为可读字符串(最多 6 个有效字符,每个字节 6 位),因此这可以是您的函数:

substr(base64_encode(pack("l", (16807 * $index) % 2147483647)), 0, 6)

You'll need something that's correct by construction, i.e. a permutation function: this is a function that does a one-to-one, reversible mapping of one integer (your sequential counter) to another.
Some examples (any combination of these should also work):

  • inverting some of the bits (f.i. using an XOR, ^ in PHP)
  • swapping the places of bits (($i & 0xc) >> 2 | ($i & 0x3) << 2), or just reversing the order of all bits
  • adding a constant value modulo your maximum range (must be a factor of two, if you're combining this with the ones above)

Example: this function will convert 0, 1, 2, 3, 5, .. into 13, 4, 12, 7, 15, .. for numbers up to 15:

$i=($input+97) & 0xf;
$result=((($i&0x1) << 3) + (($i&0xe) >> 1)) ^ 0x5;

EDIT

An easier way would to use a linear congruential generator (LCG, which is usually used for generating random numbers), which is defined by a formula of the form:

X_n+1 = (a * X_n + c) mod m

For good values of a, c and m, the sequence of X_0, X_1 .. X_m-1 will contain all numbers between 0 and m-1 exactly once. Now you can start from a linearly increasing index, and use the next value in the LCG sequence as your "secret" key.

EDIT2

Implementation:
You can design your own LCG parameters, but if you get it wrong it won't cover the full range (and thus have duplicates) so I'll use a published and tried set of parameters here from this paper:

a = 16807, c = 0, m = 2147483647

This gives you a range of 2**31. With pack() you can get the resulting integer as a string, base64_encode() makes it a readable string (of up to 6 significant characters, 6 bits per byte) so this could be your function:

substr(base64_encode(pack("l", (16807 * $index) % 2147483647)), 0, 6)
吐个泡泡 2024-08-16 00:28:10

您可以使用按位异或来扰乱某些位:

select thefield ^ 377 from thetable;

+-----+---------+
| a   | a ^ 377 |
+-----+---------+
| 154 |     483 |
| 152 |     481 |
|  69 |     316 |
|  35 |     346 |
|  72 |     305 |
| 139 |     498 |
|  96 |     281 |
|  31 |     358 |
|  11 |     370 |
| 127 |     262 |
+-----+---------+

you can use a bitwise XOR to scramble some of the bits:

select thefield ^ 377 from thetable;

+-----+---------+
| a   | a ^ 377 |
+-----+---------+
| 154 |     483 |
| 152 |     481 |
|  69 |     316 |
|  35 |     346 |
|  72 |     305 |
| 139 |     498 |
|  96 |     281 |
|  31 |     358 |
|  11 |     370 |
| 127 |     262 |
+-----+---------+
猫七 2024-08-16 00:28:10

如果您无法使用自动增量字段,并且想要一个绝对唯一值,请使用 UUID。如果您决定使用其他任何东西(除了自动增量之外),那么不检查冲突就太愚蠢了。

If you cannot use an auto increment field, and want an absolutely unique value, use UUID. If you decide to use anything else (besides auto increment), you would be silly to NOT check for collisions.

风吹短裙飘 2024-08-16 00:28:10

我认为这永远不会真正安全,因为你只需要找到短唯一字符串背后的加密方法来劫持 ID。在您的设置中检查循环中的碰撞真的有问题吗?

I think this will never be really secure, as you only need to find the encryption method behind the short unique string to hijack an ID. Is checking for collisions in a loop really that problematic in your setting?

臻嫒无言 2024-08-16 00:28:10

递增数字的 MD5
应该没问题,但我担心如果
你正在截断你的MD5(即
通常为 128 位)减少到 5-8 位
角色,你几乎肯定会
损害其作为的能力
独特的签名...

完全正确。特别是如果您达到 80% 的碰撞机会,则截断的 MD5 将与任何随机数一样好,以保证其本身的唯一性,即毫无价值。

但既然你无论如何都在使用数据库,为什么不直接使用 UNIQUE INDEX 呢?这样,唯一性检查是由 MySQL 本身完成的(以比使用循环更有效的方式)。只需尝试使用 MD5 生成的密钥执行 INSERT,如果失败,请重试...

An MD5 of an incrementing number
should be fine, but I worry that if
you're truncating your MD5 (which is
normally 128 bits) down to 5-8
characters, you will almost certainly
be damaging it's capability to act as
a unique signature...

Completely true. Especially if you reach your 80% collision chance a truncated MD5 will be as good as any random number to guarantee uniqueness by itself, i.e. worthless.

But since you're using a database anyway, why not just use a UNIQUE INDEX ? This way the uniquness check is done (in a much more efficient way than using a loop) by MySQL itself. Just try to do the INSERT with your MD5-generated key, and if it fails, try again...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文