生成随机数的算法
我希望生成一个随机数并将其发送到数据库中特定 user_id 的表。 问题是,同一个号码不能使用两次。 有一百万种方法可以做到这一点,但我希望对算法非常热衷的人有一种巧妙的方法来以优雅的解决方案解决问题,因为满足以下条件:
1)对数据库进行最少的查询。 2) 对内存中的数据结构进行最少的爬行。
本质上,这个想法是执行以下操作
1) 创建一个从 0 到 9999999 的随机数
2)检查数据库,看看该号码是否存在
或
2)查询数据库中的所有号码
3)查看返回的结果是否与来自数据库的结果匹配
4)如果匹配,则重复步骤1,如果不匹配,问题解决。
谢谢。
I'm looking to generate a random number and issue it to a table in a database for a particular user_id. The catch is, the same number can't be used twice. There's a million ways to do this, but I'm hoping someone very keen on algorithms has a clever way of solving the problem in an elegant solution in that the following criteria is met:
1) The least amount of queries to the database are made.
2) The least amount of crawling through a data structure in memory is made.
Essentially the idea is to do the following
1) Create a random number from 0 to 9999999
2) Check the database to see if the number exists
OR
2) Query the database for all numbers
3) See if the returned result matches whatever came from the db
4) If it matches, repeat step 1, if not, problem is solved.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(17)
不,你的算法不可扩展。 我之前所做的就是连续发出数字(每次+1),然后通过异或运算将它们传递给混乱的位,从而给我一个看似随机的数字。 当然,它们并不是真正随机的,但在用户眼中它们看起来是随机的。
[编辑]附加信息
该算法的逻辑是这样的,您使用已知的序列来
生成唯一的数字,然后确定性地操纵它们,
所以它们看起来不再是连续的。 一般的解决方案是使用
某种形式的加密,在我的例子中是一个异或触发器,因为
它是尽可能快的,并且它满足了数字的保证
永远不会碰撞。
但是,如果您想要更多,您可以使用其他形式的加密
随机的数字,超速(假设你不需要生成很多
一次 id)。 现在选择加密算法的重点
是“数字永远不会碰撞的保证”。 以及证明加密算法是否能够满足的方法
这个保证是检查原始数和结果是否
加密具有相同的位数,并且算法是
可逆(双射)。
[感谢亚当·利斯 & CesarB 扩展解决方案]
No your algorithm is not scalable. What I've done before is to issue numbers serially (+1 each time) and then pass them through an XOR operation to jumble the bits thus giving me a seemingly random numbers. Of course they aren't really random, but they look so to users eyes.
[Edit] Additional information
This algorithm's logic goes like this you use a known sequence to
generate unique numbers and then you deterministically manipulate them,
so they don't look serial anymore. The general solution is to use
some form of encryption, which in my case was an XOR flipflop, because
its as fast as it can get, and it fulfills the guarantee that numbers
will never collide.
However you can use other forms of encryption, if you want prefer even more
random looking numbers, over speed (say you don't need to generate many
ids at a time). Now the important point in choosing an encryption algorithm
is "the guarantee that numbers will never collide". And a way to prove if an encryption algorithm can fulfill
this guarantee is to check if both the original number and the result of
the encryption have the same number of bits, and that the the algorithm is
reversible (bijection).
[Thanks to Adam Liss & CesarB for exapanding on the solution]
为什么不直接使用 GUID 呢? 大多数语言应该有一个内置的方法来做到这一点。 它保证是唯一的(具有非常合理的界限)。
Why don't you just use a GUID? Most languages should have a built-in way to do this. It's guaranteed to be unique (with very reasonable bounds).
想要一个顶级的解决方案吗?
我认为随机性并不是为了达到加密质量,但足以阻止通过 user_id 猜测用户的寿命。
在开发过程中,以字符串形式生成所有 1000 万个数字的列表。
(可选)执行一些简单的转换,例如在中间添加一个常量字符串。 (这是为了防止结果过于可预测。)
将它们传递到生成完美哈希函数<的工具中/a>,例如 gperf。
生成的代码可用于在运行时将用户的 ID 快速编码为唯一的哈希值,保证不会与任何其他哈希值冲突。
Want an over-the-top solution?
I assume randomness is not intended to be encryption-quality, but just enough to discourage guessing the longevity of a user, by user_id.
During development, generate a list of all 10 million numbers in string form.
Optionally, perform some simple transformation, like adding a constant string to the middle. (This is just in case the result is too predictable.)
Pass them into a tool that generates Perfect Hash functions, such as gperf.
The resulting code can be used to quickly encode the user's id at runtime into a unique hash value that is guaranteed not to clash with any other hash values.
试试mysql中的语句
选择 CAST(RAND() * 1000000 AS INT)
Try the statement in mysql
SELECT CAST(RAND() * 1000000 AS INT)
假设:
您可以做一些简单的事情,例如将随机数作为 64 位整数,其中高 32 位包含时间戳(在行插入时)和低 32 位 user_id。
即使对于同一用户的多行来说,这也是唯一的,前提是您根据为同一用户添加新行的频率对时间戳使用适当的分辨率。
结合随机列上的唯一约束并捕获逻辑中的任何此类错误,然后重试。
Assuming:
You could do something simple as having the random number as a 64 bit integer, with the upper 32 bits containing the timestamp (at row insert) and the lower 32 bits the user_id.
That would be unique even for multiple rows with the same user, provided you use an appropriate resolution on your timestamp depending on how often you add new rows for the same user.
Combine with an unique constraint on the random column and catch any such error in your logic and then just retry.
我想你会发现你真的不想这样做。 随着数据库中的数字增加,您可能会在“确保该数字不被占用”循环中花费太多时间。
就我个人而言,我很幸运地使用哈希作为替代方案,但为了想出更好的解决方案,我真的需要知道为什么要这样做。
I think you'll find that you really do not want to do this. As the numbers in the database increase, you might spend too much time in the "make sure this number isn't taken" loop.
Personally, I've had luck with hashes as an alternative, but to come up with a better solution, I'd really need to know why you want to do it this way.
我的经验就是在 PHP 中使用 RNG。 我发现使用一定大小的数字(我使用的是 int,所以我的最大值为 4G)。 我进行了一些测试,发现平均而言,在 500,000 次迭代中,我得到了 120 个单个重复项。 运行循环多次后我从未得到三次重复。 我的“解决方案”是插入并检查是否失败,然后生成一个新的 ID 并再次进行。
我的建议是做同样的事情,看看你的碰撞率是多少,然后看看它是否适合你的情况。
这不是最佳选择,所以如果有人有建议,我也在寻找:)
编辑:我仅限于 5 位 ID ([a-zA-z0-9]{5,5}),ID 越长 (更多的组合,更少的碰撞)。 例如,电子邮件的 md5 几乎永远不会发生冲突。
My experience was simply using the RNG in PHP. I found that using a certain size of number (I'm using an int, so I have a max of 4G). I ran some tests and found that on average, in 500,000 iterations, I got 120 single duplicates. I never got a triplicate after running the loop a bunch of times. My "solution" was to then just insert and check if it fails, then generate a new ID and go again.
My advice is to do the same and see what your collision rate is &c and see if it's acceptable for your case.
This isn't optimal, so if anyone has suggestions I'm looking too:)
EDIT: I was limited to a 5 digit ID ([a-zA-z0-9]{5,5}), the longer the id (more combination, the few collisions). An md5 of the email would almost never conflict, for instance.
问题是,如果你生成随机数,很可能会无限地产生重复项。
然而:
虽然这也能做到你想要的,但这是一个坏主意,因为这不会长期扩展,最终你的数组会变得很大,并且需要很长时间才能生成尚未存在的随机数。你的数据库。
The problem is that if you are generating random numbers is is very possible to produce duplicates infinatly.
however:
While this will do what you want it too, it is a bad Idea as this won't scale for long, eventualy your array will get to large and it will take an extremely long time to generate a random that is not already in your db.
很容易设计一个长时间不重复的伪随机数生成器; 例如这个,它被用于你想要它的目的是一样的。
顺便说一句,为什么不按顺序发出用户 ID?
It's easy to design a pseudorandom number generator with a long period of nonrepetition; e.g. this one, which is being used for the same thing that you want it for.
BTW, why not just issue the userid's sequentially?
我喜欢 Oddthinking 的想法,但您不必选择世界上最强的哈希函数,而是可以简单地:
MD5 速度很快,并且检查字符串是否属于数组将避免您执行 SELECT。
I like Oddthinking's idea, but instead of choosing the strongest hash function in the world, you could simply:
MD5's are fast, and checking if a string belongs to an array will avoid you a SELECT.
如果您确实想获得 0 到 9 999 999 之间的“随机”数字,那么解决方案是进行一次“随机化”,然后将结果存储到磁盘上。
得到你想要的结果并不难,但我认为它更像是“用数字列出一个长列表”,而不是“得到一个随机数”。
您还需要一个指向 $numbers 中当前位置的指针(将其存储在数据库中); 从 0 开始,每次需要新数字时递增。 (或者,如果您不喜欢使用指针,则可以使用 array_shift() 或 array_pop()。)
If you really want to get "random" numbers form 0 to 9 999 999, then the solution is to do the "randomization" once, and then store the result to your disk.
It's not hard to get the result you want, but I think of it more like "make a long list with numbers", than "get a random number".
You also need a pointer to current position in $numbers (store it in a database); start with 0 and increment it each time you need a new number. (Or you could use array_shift() or array_pop(), if you dont like to use pointers.)
正确的 PRNG(伪随机数生成器)算法将有一个循环时间,在此期间它永远不会处于相同的状态。 如果您在从 PRNG 检索到的数字中公开 PRNG 的整个状态,您将获得一个保证在生成器周期内唯一的数字。
此操作的简单 PRNG 称为“线性同余”PRNG,它迭代一个公式:
执行 使用正确的一对因子,您可以从带有 32 位累加器的简单 PRNG 中获得 2^30(大约 10 亿)的周期。 请注意,您将需要一个 64 位长的临时变量来保存计算的中间“AX”部分。 大多数(如果不是全部)C 编译器都支持这种数据类型。 您还应该能够在大多数 SQL 方言上使用数字数据类型来完成此操作。
通过正确的 A 和 M 值,我们可以获得具有良好统计和几何特性的随机数生成器。 Fishman 和 Moore 撰写了一篇关于此的著名论文。
对于 M = 2^31 - 1,我们可以使用下面的 A 值来获得具有很好的长周期 (2^30 IIRC) 的 PRNG。
A 的优点:
请注意,这种类型的生成器(根据定义)不具有加密安全性。 如果你知道它生成的最后一个数字,你就可以预测它接下来会做什么。 不幸的是,我相信你无法同时获得加密安全性和保证的不可重复性。 为了使 PRNG 具有加密安全性(例如 Blum Blum Shub),它无法在生成的数字以允许预测序列中的下一个数字。 因此,内部状态比生成的数量更宽,并且(为了具有良好的安全性)周期将比可以生成的可能值的数量更长。 这意味着暴露的数字在一段时间内不会是唯一的。
出于类似的原因,长周期发生器也是如此,例如 Mersenne Twister。
A proper PRNG (Pseudo-Random Number Generator) algorithm will have a cycle time during which it will never be in the same state. If you expose the entire state of the PRNG in the number retrieved from it, you will get a number guaranteed unique for the period of the generator.
A simple PRNG that does this is called the 'Linear Congruential' PRNG which iterates a formula:
Using the right pair of factors you can get a period of 2^30 (approximately 1 billion) from a simple PRNG with a 32 bit accumulator. Note that you will need a 64 bit long long temporary variable to hold the intermediate 'AX' part of the computation. Most if not all C compilers will support this data type. You should also be able to do it with a numeric data type on most SQL dialects.
With the right values of A and M we can get a random number generator with good statistical and geometrical properties. There is a famous paper about this written by Fishman and Moore.
For M = 2^31 - 1 we get can use the values of A below to get a PRNG with a nice long period (2^30 IIRC).
Good Values of A:
Note that this type of generator is (by definition) not cryptographically secure. If you know the last number generated from it you can predict what it will do next. Unfortunately I believe that you cannot get cryptographic security and guaranteed non-repeatability at the same time. For a PRNG to be cryptographically secure (e.g. Blum Blum Shub) it cannot expose sufficient state in a generated number to allow the next number in the sequence to be predicted. Therefore the internal state is wider than the generated number and (in order to have good security) the period will be longer than the number of possible values that can be generated. This means that the exposed number will not be unique within the period.
For similar reasons the same is true of long-period generators such as the Mersenne Twister.
我实际上之前写过
I've actually previously written an article about this. It takes the same approach as Robert Gould's answer, but additionally shows how to shorten a block cipher to a suitable length using xor folding, and then how to generate the permutations over a range that isn't a power of 2, while still preserving the uniqueness property.
有几种方法可以解决这个问题,一种方法是构造一个包含数字 0000000 到 9999999 的数组,然后在该数组中随机选择这些数字
并将选取的数字值与最高值 Max 交换
达到新的最大值(
然后将 max 减少 1 并选择该数组的另一个随机成员,每次将 Max 减少 1 时
例如(基本):(右侧是应在实际程序中删除的注释)
Rndfunc 是对您正在使用的任何随机数生成器函数的调用
,然后对您想要选择的每个数字继续执行此操作,但您需要选择使用非常大的数组,
其他方法如下:生成一个数字并将其存储到可以动态增长的数组中
然后选择一个新数字并将其与数组中第一个元素到最后一个元素中间的值进行比较,在这种情况下它将是第一个选择的数字
如果匹配,则选择另一个随机数,根据大小对数组进行排序,如果没有匹配,则根据天气,它大于或小于与您比较的数字,在列表中向上或向下移动一半的距离,每次它不匹配并且大于或小于您要比较的值时。
每次将其减半,直到间隙大小达到一,然后检查一次并因没有匹配而停止,然后将数字添加到列表中,并按升序重新排列列表,依此类推,直到您完成选择随机数...希望这有帮助..
there are a couple ways to go about this one way would be to construct an array with the numbers 0000000 through 9999999 and then pick a random pick of these numbers in this array
and swap the picked numbers values with the highest value Max
then reduce max by 1 and pick another random member of this array up to the new maximum
each time reducing Max by one
for example (in basic) : (to the right are comments which should be removed in the actual program)
Rndfunc is a call to whatever random number generator function you are using
then keep doing this for each number you wish to pick , but you will need to have the option of using very big arrays
the other method would be as follows :generate a number and store it into an array that can grow dynamically
then after that pick a new number and compare it to the value that is halfway from the first to the last element in the array in this case it would be the first number picked
if it matches pick another random number, sort the array according to size and if there is not a match then depending on weather it is greater or smaller than the number you compared it with you go up or down in the list half of half the distance, each time that it does not match and is greater or lesser than what you are comparing it to.
each time halving it until you reach a gap size of one then you check once and stop as there is no match, and then the number is added to the list and the list is reshuffled in ascending order, so on and so on until you are done picking random numbers... hope this helps..
PHP 已经有一个函数,uniqid。 它生成一个标准的 uuid,如果您必须从其他地方访问数据,这非常有用。 不要重新发明轮子。
PHP already has a function for this, uniqid. It generates a standard uuid which is great if you have to access the data from elsewhere. Don't reinvent the wheel.
如果您想确保随机数不重复,则需要一个不重复的随机数生成器(如此处)。
基本思想是下面的公式
seed *seed & p 将为任何输入 x 生成不重复的随机数,使得 2x < p
和p - x * x % p
生成所有其他随机数以及非重复的随机数,但前提是p = 3 mod 4
。 因此基本上您所需要的只是一个尽可能接近9999999
的 primnumber 。 这样,工作量可以减少到单个读取字段,但缺点是要么生成太大的 ID,要么生成太少的 ID。该算法的排列效果不太好,因此我建议将其与异或或加法或其他一些方法结合起来,以更改确切的值,而不会破坏种子与其生成值之间的一对一关系。
If you want to ensure that the random-numbers aren't repeating, you need a non-repeating random number-generator (as described here).
The basic idea is that the following formula
seed * seed & p
will produced non-repeating random-numbers for any inputx such that 2x < p
andp - x * x % p
produces all other random-number aswell non-repeating, but only ifp = 3 mod 4
. So basically all you need is a single primnumber as close to9999999
as possible. This way the effort can be reduced to a single read field, but with the downside that either too big IDs are generated or too few IDs will be generated.This algorithm doesn't permute very well, so i'd recommend combining it with either XOR or addition or some other approach to change the exact value without destroying the 1-to-1-relation between seeds and their generated value.
随机生成:(在 PHP 中)
代码:
它完美地创建随机数。
结果:
第一次:
第二次:
第三次:
For Random Generation: ( In PHP )
Code:
It creates random numbers perfectly.
Result:
First Time:
Second Time:
Third Time: