8 字符随机代码
我已经回答了一些类似问题的答案,但找不到我想要的东西。
有没有一种更有效的方法来生成 8 个字符的唯一 ID(基数 36 (0-9A-Z)),而不是生成唯一 ID 并查询数据库以查看它是否已存在并重复,直到获得尚未存在的唯一 ID。用过的?
我发现的其他解决方案使用时间,但这可能太容易猜测,并且在分布式系统中可能效果不佳。将这些 ID 视为促销代码。
I've been through answers to a few similar questions asked on SO, but could not find what I was looking for.
Is there a more efficient way to generate 8 character unique IDs, base 36 (0-9A-Z), than generating a unique ID and querying the DB to see if it already exists and repeating until you get a unique ID that has not been used?
Other solutions I found use time, but this is perhaps too easy to guess and may not work well in distributed systems. Consider these IDs to be promo codes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
一种选择是反过来做:每当需要时在数据库中生成大量它们,然后在需要时从数据库中获取单个,或者为您的特定进程保留一大堆它们(即在数据库中将它们标记为“可能使用”)然后从内存中将它们分配出去。
One option is to do it the other way round: generate a huge number of them in the database whenever you need to, then either fetch a single one from the DB when you need one, or reserve a whole bunch of them for your particular process (i.e. mark them as "potentially used" in the database) and then dole them out from memory.
我质疑你的“低效”方法实际上是低效的。考虑一下:
通过仔细的设计,您应该能够在一个数据库请求中生成一个有保证的唯一 ID,几乎在任何时候......除非您有大量的现有 ID。 (如果这样做,只需在 ID 中添加另外几个字符,问题就会再次消失。)
如果您愿意,可以通过批量生成 ID 将数据库操作的平均数量减少到每个 ID 一次以下,但它们是潜在的复杂性,特别是当您需要记录实际使用的 ID 数量时。
但是,如果您最多有 150,000 个 ID(我假设是在很长一段时间内生成的),那么批量创建 ID 就不值得了……除非您正在进行批量上传操作。
I question that your "inefficient" approach is actually inefficient. Consider this:
With careful design, you should be able to generate a guaranteed unique ID in one database request, almost all of the time ... unless you have an awfully large number of existing IDs. (And if you do, just add another couple of characters to the ID and the problem goes away again.)
If you want to, you can reduce the average number of database operations to less than one per ID by generating the IDs in batches, but their are potential complications, especially if you need to record the number of IDs that are actually in use.
But, if you have at most 150,000 IDs (I assume, generated over a long period of time) then creating the IDs in batches is not worth the effort ... unless you are doing a bulk upload operation.
不幸的是,8基36位数字有点小。只有 200 万个可能的 ID,因此如果您随机生成 140 万个,则发生碰撞的可能性大约为一半。
您可以使用具有较大周期的 PRNG,并通过某种双射将其当前状态映射到您的 ID 空间。 41 位 LFSR 并非无法破解,但如果您要保护的东西不是那么有价值,则可能还不错。通过为不同的节点提供不同的位置来启动循环,您可以进行一定程度的分发,而无需始终访问数据库。
当然,任何此类确定性方法的问题在于,一旦被破坏,它就完全被破坏了,您将无法再信任任何 ID。因此,从数据库中分发数字可能是一种可行的方法,并通过以一千个或其他方式分发它们来进行分发。
如果您有更大的 ID 空间,那么您可以使用更安全的技术,例如 ID 可以包含用于识别源的内容、该源的递增序列号以及使用源唯一密钥的 HMAC。
Unfortunately, 8 base 36 digits is a bit small. It's only 2 million million possible IDs, so if you generate 1.4 million randomly you have about a half chance of a collision.
You could possibly use a PRNG with a large period, and map its current state to your ID space via some bijection. A 41 bit LFSR wouldn't be uncrackable, but might be reasonably OK if the thing you're protecting isn't all that valuable. You could distribute somewhat without having to access the DB all the time, by providing different nodes with a different position to start the cycle.
The trouble with any such deterministic method, of course, is that once it's broken it's completely broken, and you can no longer trust any IDs. So doling numbers out of a database is probably the way to go, and distribute by doling them out in batches of a thousand or whatever.
If you had a larger ID space, then you could use more secure techniques, for example the ID could consist of something to identify the source, an incrementing serial number for that source, and an HMAC using a key unique to the source.
如果只有一个 ID 源(即:您不需要协调不同机器上的多个独立源),您可以执行以下操作:
计算一个数字可能具有的最大位数,以便它不会不超过 0-9A-Z 8 个符号字符串中包含的信息。这将是
floor(log2(36^8))
= 41 位。让计数器(41 位)从零开始
transform(counter++)
transform
函数必须是双射的,并且可以是以下操作的任意长序列(当计算模2^41
时,它们本身都是双射的):当您完成后,您只需要另一个函数
encode(number)
将数字转换为 base36。If there is only one source of IDs (that is: you don't need to coordinate multiple independent sources on different machines) you can do the following:
Calculate the maximum number of bits that a number may have so that it doesn't exceed the information contained in an 8-symbol string of 0-9A-Z. This would be
floor(log2(36^8))
= 41 bits.Have a counter (with 41 bits) start at zero
transform(counter++)
for each ID requestThe
transform
function has to be bijective and can be an arbitrarily long sequence of the following operations (which are all bijective themselves when they are calculated modulo2^41
):When you are finished with that, you only need another function
encode(number)
to transform the number to base36.我曾经使用 C++ 解决了一个类似的问题,涉及较少数量的可能 ID,但考虑一些方法来扩展它可能会很有用。基本上,我为所有可能的 ID 创建了一个大位图,然后通过测试其正确的位来查找某个 ID 是否正在使用。
为了最大限度地减少 RAM 需求,我将位图存储在原始二进制文件中,并使用随机访问文件 I/O 来查找包含我需要检查和/或设置的相应位的字节。
更大的 ID 空间将需要 328 GB 的位图,这可能是不可能的。另一方面,已使用 ID 的 Python
集
可能是可以接受的,具体取决于您认为最终可能实际使用的 ID 数量。其他替代方案可能是某种稀疏文件或稀疏矩阵技术,例如稀疏文件 ="http://docs.scipy.org/doc/scipy/reference/sparse.html" rel="nofollow">scipy.sparse。希望这有帮助。
I once solved a similar problem using C++ involving a smaller number of possible IDs, but there might be useful to consider some ways to scale it up. Basically I created a big bitmap for all the possible IDs and would just lookup whether one was in use by testing the proper bit for it.
To minimize RAM requirements, I stored the bitmap in a raw binary file and used random-access file i/o to seek to the byte with the corresponding bit in it I needed to check and/or set.
Your much larger ID space would require a 328 GB bitmap, which is likely out of the question. On the other hand, a Python
set
of the used IDs might be acceptable, depending on how many IDs you think might actually end up being used. Other alternatives might be some sort of sparse file or sparse matrix technique, such as those in scipy.sparse.Hope this helps.
下面是一些用于生成随机、base36 ID 的 python 代码。
Here's some python code for generating random, base36 IDs.
我做了类似生成激活码的操作:一次性使用的 8 个字母的小写字符串。它们旨在生成后短时间内使用(通常在几分钟内,但可能不会长达一周),但必须是唯一的。当它们被使用后,它们就会从数据库中删除。
我只是生成一个值并查看它是否在数据库中使用。目前这种方法有效,因为数据库中没有大量未使用的代码,但即使已经为您提供了代码,仍然不容易猜测。
至于生成代码:
I do something similar to generate activation codes: 8-letter lowercase strings that are single-use. They are intended to be used within a short time of being generated (usually within minutes, but possibly not for up to a week), but must be unique. When they have been used, they are removed from the database.
I just generate a value and see if it is in use in the database. This works for now, because there are not heaps of unused codes sitting in the database, but are still not easy to guess, even if you have been provided with one.
As for the generation code:
它需要加密安全吗?
如果不是,则 pc(n) = a + bn(其中 b 相对于 36^8 是质数)即可。使用字节数组。
Does it need to be cryptographically secure?
If not, pc(n) = a + bn, where b is prime relative to 36^8 will do. Use an array of byte.