使用有偏随机数生成器的无偏随机数生成器
您有一个有偏差的随机数生成器,它以 p 的概率生成 1,以 (1-p) 的概率生成 0。你不知道 p 的值。使用它可以创建一个无偏随机数生成器,它以 0.5 的概率生成 1,以 0.5 的概率生成 0。
注意:本题是 Cormen、Leiserson、Rivest、Stein 的《算法导论》中的一道练习题。(clrs)
You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator which produces 1 with a probability 0.5 and 0 with a probability 0.5.
Note: this problem is an exercise problem from Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein.(clrs)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
事件 (p)(1-p) 和 (1-p)(p) 是等概率的。将它们分别视为 0 和 1 并丢弃其他两对结果,您将得到一个无偏随机生成器。
在代码中,这很容易完成:
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
In code this is done as easy as:
从有偏见的硬币产生无偏见的硬币的过程首先归因于冯·诺依曼(一个在数学和许多相关领域做出了大量工作的人)。过程非常简单:
该算法之所以有效,是因为获得 HT 的概率为
p(1-p)
,这与获得 TH(1-p)p
相同。因此,两个事件发生的可能性是相等的。我也在读这本书,它询问预计的运行时间。两次抛掷不相等的概率为
z = 2*p*(1-p)
,因此预期运行时间为1/z
。前面的示例看起来令人鼓舞(毕竟,如果您有一个偏差为
p=0.99
的有偏差硬币,您将需要扔硬币大约 50 次,这并不是很多)。所以你可能会认为这是一个最优算法。遗憾的是事实并非如此。以下是它与香农理论界限的比较(图像取自此答案)。这表明该算法很好,但远非最优。
如果你认为HHTT会被这个算法丢弃,你可以想出一个改进,但实际上它与TTHH有相同的概率。所以你也可以停在这里并返回H。同样的还有HHHHTTTT等等。使用这些案例可以提高预期的运行时间,但并不能使其在理论上达到最佳状态。
最后 - python 代码:
这是不言自明的,这是一个示例结果:
如您所见,尽管如此,我们的偏差为
0.097
,但我们得到了大约相同数量的1
和0
The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:
The reason this algorithm works is because the probability of getting HT is
p(1-p)
, which is the same as getting TH(1-p)p
. Thus two events are equally likely.I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is
z = 2*p*(1-p)
, therefore the expected running time is1/z
.The previous example looks encouraging (after all, if you have a biased coin with a bias of
p=0.99
, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.
You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.
And in the end - python code:
It is pretty self-explanatory, and here is an example result:
As you see, nonetheless we had a bias of
0.097
, we got approximately the same number of1
and0
冯·诺依曼提出的一次获取两位的技巧,即 01 对应 0,10 对应 1,然后重复 00 或 11。使用此方法需要提取以获得单个位的位的预期值为
1/p(1-p)
,如果p
特别重要,则该值可能会变得相当大小或大,因此值得询问该方法是否可以改进,特别是因为很明显它丢弃了大量信息(所有 00 和 11 情况)。谷歌搜索“冯诺依曼技巧偏向”产生了这篇论文,它开发了一个更好的问题的解决方案。这个想法是,您仍然一次获取两位,但如果前两次尝试仅产生 00 和 11,则您将一对 0 视为单个 0,将一对 1 视为单个 1,并应用冯·诺依曼的技巧到这些对。如果这也不起作用,请继续在此级别的配对中进行类似的组合,依此类推。
此外,本文将其发展为从有偏差的源生成多个无偏差的位,本质上使用两种不同的方式从位对生成位,并给出一个草图,表明这是最佳的,因为它恰好产生了位数原始序列中含有熵。
The trick attributed to von Neumann of getting two bits at a time, having 01 correspond to 0 and 10 to 1, and repeating for 00 or 11 has already come up. The expected value of bits you need to extract to get a single bit using this method is
1/p(1-p)
, which can get quite large ifp
is especially small or large, so it is worthwhile to ask whether the method can be improved, especially since it is evident that it throws away a lot of information (all 00 and 11 cases).Googling for "von neumann trick biased" produced this paper that develops a better solution for the problem. The idea is that you still take bits two at a time, but if the first two attempts produce only 00s and 11s, you treat a pair of 0s as a single 0 and a pair of 1s as a single 1, and apply von Neumann's trick to these pairs. And if that doesn't work either, keep combining similarly at this level of pairs, and so on.
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
除了其他答案中给出的冯诺依曼过程之外,还有一整套技术,称为随机性提取(也称为去偏、去偏斜、或白化),用于从未知“成功概率”的随机数中产生无偏随机位。其中包括 Peres (1992) 的迭代冯诺依曼过程,以及 Zhou 和 Bruck (2012) 的“提取树”。这两种方法(以及其他几种方法)都是渐近最优,也就是说,随着输入数量变大,它们的效率(以每个输入的输出位数计)接近最佳极限(Pae 2018)。
例如,Peres 提取器采用一个位列表(具有相同“成功概率”的 0 和 1)作为输入,描述如下:
更不用说从“有偏差”骰子或其他“有偏差”随机数(不仅仅是位)产生无偏差随机位的过程 例如,参见 Camion (1974)。
我在随机性提取说明中讨论了有关随机性提取器的更多信息。
参考文献:
Besides the von Neumann procedure given in other answers, there is a whole family of techniques, called randomness extraction (also known as debiasing, deskewing, or whitening), that serve to produce unbiased random bits from random numbers of unknown "success probability". They include Peres's (1992) iterated von Neumann procedure, as well as an "extractor tree" by Zhou and Bruck (2012). Both methods (and several others) are asymptotically optimal, that is, their efficiency (in terms of output bits per input) approaches the optimal limit as the number of inputs gets large (Pae 2018).
For example, the Peres extractor takes a list of bits (zeros and ones with the same "success probability") as input and is described as follows:
This is not to mention procedures that produce unbiased random bits from "biased" dice or other "biased" random numbers (not just bits); see, e.g., Camion (1974).
I discuss more on randomness extractors in a note on randomness extraction.
REFERENCES:
您需要从 RNG 中提取成对的值,直到获得一系列不同的值,即零后跟一或一后跟零。然后,您获取该序列的第一个值(或最后一个值,无关紧要)。 (即只要绘制的对是两个零或两个一就重复)
这背后的数学原理很简单:先 0 后 1 的序列与先 1 后 0 的序列具有完全相同的概率。通过始终将该序列的第一个(或最后一个)元素作为新 RNG 的输出,我们有均匀的机会获得 0 或 1。
You need to draw pairs of values from the RNG until you get a sequence of different values, i.e. zero followed by one or one followed by zero. You then take the first value (or last, doesn't matter) of that sequence. (i.e. Repeat as long as the pair drawn is either two zeros or two ones)
The math behind this is simple: a 0 then 1 sequence has the very same probability as a 1 then zero sequence. By always taking the first (or the last) element of this sequence as the output of your new RNG, we get an even chance to get a zero or a one.
这是一种方法,可能不是最有效的。仔细研究一堆随机数,直到获得 [0..., 1, 0..., 1] 形式的序列(其中 0... 是一个或多个 0)。计算 0 的数量。如果第一个序列较长,则生成 0,如果第二个序列较长,则生成 1。(如果它们相同,请重试。)
这就像 HotBits 从放射性粒子衰变生成随机数一样:
HotBits:如何实现作品
Here's one way, probably not the most efficient. Chew through a bunch of random numbers until you get a sequence of the form [0..., 1, 0..., 1] (where 0... is one or more 0s). Count the number of 0s. If the first sequence is longer, generate a 0, if the second sequence is longer, generate a 1. (If they're the same, try again.)
This is like what HotBits does to generate random numbers from radioactive particle decay:
HotBits: How It Works
我只是用一些运行证明来解释已经提出的解决方案。无论我们改变概率多少次,这个解决方案都是无偏的。在头 n 尾抛掷中,连续的
head n tail
或tail n head
的排他性始终是无偏的。I'm just explaining the already proposed solutions with some running proof. This solution will be unbiased, no matter how many times we change the probability. In a head n tail toss, the exclusivity of consecutive
head n tail
ortail n head
is always unbiased.