如何“检查”如果一个函数真的给出了随机结果?
如何确定一个函数确实是随机的或者尽可能接近这个概念?另外,随机和伪随机之间有什么区别?最后,可以使用哪些算法/来源来生成随机数?
PS:还问这个问题是因为使用 ORDER BY RAND() LIMIT 1 的 MySQL 语句没有给出令人信服的结果。
How can one be sure that a function is really random or as close to the notion as possible? Also, what is the distinction between random and pseudo-random? Finally, what algorithms/sources can be used to generate random numbers?
P.S: Also asking this because a MySQL statement using ORDER BY RAND() LIMIT 1
isn't giving convincing results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
关于随机的问题是,您无法判断随机函数的返回是否是随机的。
...或...
正确的随机使用真正随机的东西,例如 白噪声。伪随机数通常是根据数学公式或预先计算的表计算出来的。 线性同余生成器是生成它们的流行方法。
为了获得真正的随机数,您通常需要与有机生成某些内容的外部源进行交互。这称为真随机数生成器。
The thing about being random is that you can't tell if the return from a random function is random or not.
...or...
Proper random uses something that can truly be random, such as white noise. Pseudo random numbers are generally calculated from mathematical formulae or precomputed tables. The Linear congruential generator is a popular method of generating them.
To get a real random number, you generally want to interface with an outside source where something has been generated organically. This is called a True Random Number Generator.
阿罗哈!
有多种方法和工具可用于测试随机性。这些应用于从要测试的生成器收集的一组数字。也就是说,您根据生成的一组数据来测试生成器。
在计算领域,尤其是 IT 安全领域,我们通常希望拥有一个符合统一随机过程的生成器。有许多不同的流程,但我猜您想要的是一个统一的流程。
NIST 发布了多份文档,其中包含有关伪随机数生成器以及如何测试它们的建议。请参阅 NIST 文档 SP 800-22 和 SP 800-20。
正如其他人指出的那样。如果您想要一个真随机数生成器 (TRNG),您需要收集物理熵。此类源的示例有放射性衰变、宇宙辐射、熔岩灯等。您最好需要难以操纵的源。 IETF 有一个 RFC,其中有一些很好的建议,请参阅 RFC 4086 - Source of Randomness for Security:
https://www.rfc-editor.org/rfc/rfc4086
你通常会做什么所做的就是从一个以上(最好是多个)来源收集熵。然后对收集到的数据进行过滤(白化),最后用于定期播种良好的 PRNG。当然,用不同的种子。
这就是大多数现代优秀随机生成器的工作原理。熵收集器提供使用对称密码(例如 AES)或哈希函数等加密原语创建的 PRNG。例如,参见 Schneier 的随机生成器 Yarrow/Fortuna,它以修改后的形式在 FreeBSD 中使用。
回到你关于测试的问题。正如有人指出的那样,Marsaglia 已经制定了一套很好的测试,并已编入 DIEHARD 测试中。现在 Dieharder 测试中有一组更扩展的测试:
http://www.phy.duke.edu/~rgb/General/ dieharder.php
Dieharder 是一个很好的工具,它可以让您确信提供给它的大量数字(从您的生成器收集)是随机的(具有良好的质量)与否。运行 Dieharder 很容易,但需要一些时间。
随机性的现场测试很困难。您通常不想在系统中实现 Dieharder。您可以做的是实现一些简单的检测器来检测病理病例。我通常建议:
等值长度。一个简单的计数器,每当 RNG 生成的两个连续值不同时就会重置。然后,当您认为计数器显示 RNG 损坏时,您需要定义一个阈值。如果您看到 1000 万个相等的值,并且值空间大于一个值(您看到的值),那么您的 RNG 可能无法正常工作。特别是如果看到的值是边缘值之一。例如 0x00000.... 或 0xfffff...
中值。如果在生成一百万个值并具有均匀分布后,中值严重倾向于值空间边缘之一,而不是靠近中间,则可能也有问题。
方差。如果您在生成数百万个值后没有看到接近值空间的 MIN 和 MAX 的值,而是生成了一个狭窄的值空间,那么也有问题。
最后。由于您希望使用良好的 PRNG(例如基于 AES),因此建议的原位测试可能会应用于熵源。
我希望这在某些方面有所帮助。
Aloha!
There are several methods and tools for testing for randomness. These are applied on a set of numbers collected from the generator to be tested. That is, you test the generator based on a set of data generated.
In computing, esp for IT-security we normally want to have a generator that conforms to a uniform random process. There are many different processes, but I'm guessing that it is a uniform process you are aiming for.
NIST has published several documents with recommendations on both pseudo random number generators as well how to test them. Look at NIST documents SP 800-22 and SP 800-20.
As somebody else pointed out. If you want a True Random Number Generator (TRNG) you need to gather physical entropy. Examples of such sources are radioactive decay, cosmic radiation, lava lamps etc. Preferably you want sources that are hard to manipulate. IETF has an RFC that have some good recommendations, see RFC 4086 - Source of Randomness for Security:
https://www.rfc-editor.org/rfc/rfc4086
What you normally do is to collect entropy from one ore more (preferably more than one) source. The collected data is then filtered (whitening) and finally used to periodically seed a good PRNG. With different seeds, naturally.
This is how most modern good random generators works. An entropy collector feeding a PRNG created using cryptographic primitives such as symmetric ciphers (AES for example) or hash functions. See for example the random generator Yarrow/Fortuna by Schneier, which in modified form is used in FreeBSD.
Coming back to your question about testing. As somebody pointed out Marsaglia have produced a good set of tests, which was codified in the DIEHARD tests. There are now an even more exapnded set of tests in the Dieharder tests:
http://www.phy.duke.edu/~rgb/General/dieharder.php
Dieharder is a good tool that will give you good confidence that the huge pile of numbers supplied to it (collected from your generator) is random (with good quality) or not. Running Dieharder is easy, but will take some time.
In situ testing of randomness is hard. You don't normally want to implement Dieharder in your system. What you can do is implement some simple detectors that should detect patholigical cases. I usually suggest:
Equal value length. A simple counter that is reset whenever two consequtive values generated by the RNG differs. And then you need to define a threshold when you think the counter shows that the RNG is broken. If you see 10 million equal values and the value space is greater that one value (the one you see) your RNG is probably not working all that well. Esp if the value are seeing is one of the edge values. For example 0x00000.... or 0xfffff...
Median value. If you after generating a million values and have a uniform distribution have a median value that is heavily leaning towards one of the value space edges, not close to the middle, someting is probably also amiss.
Variance. If you after generating million of values haven't seen values close to the MIN and MAX of the value space, but instead have a narrow generated value space, then something is also amiss.
Finally. Since you hopefully are using a good PRNG (based on AES for example), the in situ-tests suggested might instead be applied on the entropy source.
I hope that helped in some ways.
您可以应用统计测试来查看给定的数字序列是独立同分布 (iid) 随机变量的可能性。
查看随机数生成器的当前视图作者:乔治·马尔萨利亚。特别是请查看第 6-12 节。本文介绍了此类测试,然后介绍了您可以应用的一些测试。
There are statistical tests you can apply to see how likely it is that a given sequence of numbers were independent, identically distributed (iid) random variables.
Take a look at A Current View of Random Number Generators by George Marsaglia. In particular, take a look at sections 6-12. This provides an introduction to such tests followed by several that you can apply.
确实,我们不能保证随机数实际上是随机的。
关于伪随机数:是的,它们似乎是随机的(最初用于密码学)(伪随机函数),当发送加密文本和陷阱之间的邪恶消息时,他认为他得到的加密文本是随机的,但消息是从某个函数计算出来的,此外,您将使用相同的函数和密钥获得相同的消息(如果有的话,所以它们不是随机的,只是看起来像随机的,因为您无法创建它生成的原始文本/数字比如哈希。函数(md5、sha1)和加密技术(des、aes 等)。
True, We can not guarantee the random number is actually a random .
about pseudo-random numbers : yes they just seems to be random ( Originally used in cryptography) (pseudo random functions ), when sending encrypted text and the evil in between traps the message thinks that the encrypted text he got is random, but the message was calculated from some function, moreover you will get the same message using the same function and key ( if any , so no-where they are not random, just look like random because you can not create the original text/number from which it generate. Such as hash functions(md5,sha1) and encryption techniques ( des,aes etc ).
对于随机的数字,它一定是不可能预测的。因此,任何生成“随机”数的算法都会生成伪随机数,因为始终可以使用先前使用的种子或“随机化”过程中使用的值来生成相同的“随机”数序列。真正的随机数可以通过掷骰子等方式生成,但不能通过计算机算法生成。
For the number to be random, it must not be possible to predict it. So, any algorithm that generates "random" numbers generates pseudo-random numbers, as it is always possible to generate the same sequence of "random" numbers, using prievously used seed or value that is used during "randomizing". Truly random number can be generated by for example dice roll, but not computer algorithm.
理论计算机科学告诉我们,计算机是一台确定性机器。每个算法总是以相同的方式运行,所以你必须改变你的种子。但是计算机应该从哪里获取随机种子呢?从外部设备? CPU温度(不会有太大变化)?
The theoretical computer science teaches that a computer is a deterministic machine. Every algorithm always runs the same way, so you have to vary your seed. But where should a computer get a random seed from? From an external device? The CPU temperature (which would not vary much)?
要测试返回随机数的函数,您应该多次调用它并查看每个数字返回了多少次。
例如,
完美的输出应该是每个随机输出的数量相等。像这样的东西:
To test a function that returns random numbers you should call it many times and see how many times each number is returned.
For example
A perfect output should be equal numbers for each random output. Something like: