使用有偏随机数生成器的无偏随机数生成器

发布于 2024-08-17 04:50:28 字数 193 浏览 10 评论 0原文

您有一个有偏差的随机数生成器,它以 p 的概率生成 1,以 (1-p) 的概率生成 0。你不知道 p 的值。使用它可以创建一个无偏随机数生成器,它以 0.5 的概率生成 1,以 0.5 的概率生成 0。

注意:本题是 Cormen、Leiserson、Rivest、Stein 的《算法导论》中的一道练习题。(clrs)

You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator which produces 1 with a probability 0.5 and 0 with a probability 0.5.

Note: this problem is an exercise problem from Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein.(clrs)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

无戏配角 2024-08-24 04:50:28

事件 (p)(1-p) 和 (1-p)(p) 是等概率的。将它们分别视为 0 和 1 并丢弃其他两对结果,您将得到一个无偏随机生成器。

在代码中,这很容易完成:

int UnbiasedRandom()
{
    int x, y;

    do
    {
        x = BiasedRandom();
        y = BiasedRandom();
    } while (x == y);

    return x;
}

The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.

In code this is done as easy as:

int UnbiasedRandom()
{
    int x, y;

    do
    {
        x = BiasedRandom();
        y = BiasedRandom();
    } while (x == y);

    return x;
}
太阳公公是暖光 2024-08-24 04:50:28

从有偏见的硬币产生无偏见的硬币的过程首先归因于冯·诺依曼(一个在数学和许多相关领域做出了大量工作的人)。过程非常简单:

  • 抛硬币两次。
  • 如果结果匹配,则重新开始,忘记两个结果。
  • 如果结果不同,请使用第一个结果,忽略第二个结果。

该算法之所以有效,是因为获得 HT 的概率为 p(1-p),这与获得 TH (1-p)p 相同。因此,两个事件发生的可能性是相等的。

我也在读这本书,它询问预计的运行时间。两次抛掷不相等的概率为 z = 2*p*(1-p),因此预期运行时间为 1/z


前面的示例看起来令人鼓舞(毕竟,如果您有一个偏差为 p=0.99 的有偏差硬币,您将需要扔硬币大约 50 次,这并不是很多)。所以你可能会认为这是一个最优算法。遗憾的是事实并非如此。

以下是它与香农理论界限的比较(图像取自此答案)。这表明该算法很好,但远非最优。

输入图像描述这里

如果你认为HHTT会被这个算法丢弃,你可以想出一个改进,但实际上它与TTHH有相同的概率。所以你也可以停在这里并返回H。同样的还有HHHHTTTT等等。使用这些案例可以提高预期的运行时间,但并不能使其在理论上达到最佳状态。


最后 - python 代码:

import random

def biased(p):
    # create a biased coin
    return 1 if random.random() < p else 0

def unbiased_from_biased(p):
    n1, n2 = biased(p), biased(p)
    while n1 == n2:
        n1, n2 = biased(p), biased(p)

    return n1

p = random.random()
print p

tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2

这是不言自明的,这是一个示例结果:

0.0973181652114
505 495

如您所见,尽管如此,我们的偏差为 0.097,但我们得到了大约相同数量的 10

The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:

  • Toss the coin twice.
  • If the results match, start over, forgetting both results.
  • If the results differ, use the first result, forgetting the second.

The reason this algorithm works is because the probability of getting HT is p(1-p), which is the same as getting TH (1-p)p. Thus two events are equally likely.

I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is z = 2*p*(1-p), therefore the expected running time is 1/z.


The previous example looks encouraging (after all, if you have a biased coin with a bias of p=0.99, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.

Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.

enter image description here

You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.


And in the end - python code:

import random

def biased(p):
    # create a biased coin
    return 1 if random.random() < p else 0

def unbiased_from_biased(p):
    n1, n2 = biased(p), biased(p)
    while n1 == n2:
        n1, n2 = biased(p), biased(p)

    return n1

p = random.random()
print p

tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2

It is pretty self-explanatory, and here is an example result:

0.0973181652114
505 495

As you see, nonetheless we had a bias of 0.097, we got approximately the same number of 1 and 0

秋凉 2024-08-24 04:50:28

冯·诺依曼提出的一次获取两位的技巧,即 01 对应 0,10 对应 1,然后重复 00 或 11。使用此方法需要提取以获得单个位的位的预期值为 1/p(1-p),如果 p 特别重要,则该值可能会变得相当大小或大,因此值得询问该方法是否可以改进,特别是因为很明显它丢弃了大量信息(所有 00 和 11 情况)。

谷歌搜索“冯诺依曼技巧偏向”产生了这篇论文,它开发了一个更好的问题的解决方案。这个想法是,您仍然一次获取两位,但如果前两次尝试仅产生 00 和 11,则您将一对 0 视为单个 0,将一对 1 视为单个 1,并应用冯·诺依曼的技巧到这些对。如果这也不起作用,请继续在此级别的配对中进行类似的组合,依此类推。

此外,本文将其发展为从有偏差的源生成多个无偏差的位,本质上使用两种不同的方式从位对生成位,并给出一个草图,表明这是最佳的,因为它恰好产生了位数原始序列中含有熵。

The trick attributed to von Neumann of getting two bits at a time, having 01 correspond to 0 and 10 to 1, and repeating for 00 or 11 has already come up. The expected value of bits you need to extract to get a single bit using this method is 1/p(1-p), which can get quite large if p is especially small or large, so it is worthwhile to ask whether the method can be improved, especially since it is evident that it throws away a lot of information (all 00 and 11 cases).

Googling for "von neumann trick biased" produced this paper that develops a better solution for the problem. The idea is that you still take bits two at a time, but if the first two attempts produce only 00s and 11s, you treat a pair of 0s as a single 0 and a pair of 1s as a single 1, and apply von Neumann's trick to these pairs. And if that doesn't work either, keep combining similarly at this level of pairs, and so on.

Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.

完美的未来在梦里 2024-08-24 04:50:28

除了其他答案中给出的冯诺依曼过程之外,还有一整套技术,称为随机性提取(也称为去偏去偏斜、或白化),用于从未知“成功概率”的随机数中产生无偏随机位。其中包括 Peres (1992) 的迭代冯诺依曼过程,以及 Zhou 和 Bruck (2012) 的“提取树”。这两种方法(以及其他几种方法)都是渐近最优,也就是说,随着输入数量变大,它们的效率(以每个输入的输出位数计)接近最佳极限(Pae 2018)。

例如,Peres 提取器采用一个位列表(具有相同“成功概率”的 0 和 1)作为输入,描述如下:

  1. 创建两个名为 U 和 V 的空列表。然后,当两个或多个位保留在输入:
    • 如果接下来的两位是 0/0,则将 0 附加到 U,将 0 附加到 V。
    • 否则,如果这些位是 0/1,则将 1 附加到 U,然后写入 0。
    • 否则,如果这些位是 1/0,则将 1 附加到 U,然后写入 1。
    • 否则,如果这些位为 1/1,则将 0 附加到 U,将 1 附加到 V。
  2. 递归运行此算法,从放置在 U 中的位读取。
  3. 递归运行此算法,从位读取 ;

更不用说从“有偏差”骰子或其他“有偏差”随机数(不仅仅是位)产生无偏差随机位的过程 例如,参见 Camion (1974)。

我在随机性提取说明中讨论了有关随机性提取器的更多信息。

参考文献:

  • Peres, Y.,“迭代冯·诺依曼提取随机位的过程”,《统计年鉴》1992,20,1,第 14 页。 590-597。
  • Zhou, H. 和 Bruck, J.,“用于最佳生成随机位的流式算法”,arXiv :1209.0730 [cs.IT],2012。S
  • . Pae,“二值化树和随机数生成 ”,arXiv:1602.06058v2 [cs.DS]。
  • Camion, Paul,“用有偏向的骰子滚动无偏向的骰子”,北卡罗来纳州立大学。统计系,1974。

Besides the von Neumann procedure given in other answers, there is a whole family of techniques, called randomness extraction (also known as debiasing, deskewing, or whitening), that serve to produce unbiased random bits from random numbers of unknown "success probability". They include Peres's (1992) iterated von Neumann procedure, as well as an "extractor tree" by Zhou and Bruck (2012). Both methods (and several others) are asymptotically optimal, that is, their efficiency (in terms of output bits per input) approaches the optimal limit as the number of inputs gets large (Pae 2018).

For example, the Peres extractor takes a list of bits (zeros and ones with the same "success probability") as input and is described as follows:

  1. Create two empty lists named U and V. Then, while two or more bits remain in the input:
    • If the next two bits are 0/0, append 0 to U and 0 to V.
    • Otherwise, if those bits are 0/1, append 1 to U, then write a 0.
    • Otherwise, if those bits are 1/0, append 1 to U, then write a 1.
    • Otherwise, if those bits are 1/1, append 0 to U and 1 to V.
  2. Run this algorithm recursively, reading from the bits placed in U.
  3. Run this algorithm recursively, reading from the bits placed in V.

This is not to mention procedures that produce unbiased random bits from "biased" dice or other "biased" random numbers (not just bits); see, e.g., Camion (1974).

I discuss more on randomness extractors in a note on randomness extraction.

REFERENCES:

星光不落少年眉 2024-08-24 04:50:28

您需要从 RNG 中提取成对的值,直到获得一系列不同的值,即零后跟一或一后跟零。然后,您获取该序列的第一个值(或最后一个值,无关紧要)。 (即只要绘制的对是两个零或两个一就重复)

这背后的数学原理很简单:先 0 后 1 的序列与先 1 后 0 的序列具有完全相同的概率。通过始终将该序列的第一个(或最后一个)元素作为新 RNG 的输出,我们有均匀的机会获得 0 或 1。

You need to draw pairs of values from the RNG until you get a sequence of different values, i.e. zero followed by one or one followed by zero. You then take the first value (or last, doesn't matter) of that sequence. (i.e. Repeat as long as the pair drawn is either two zeros or two ones)

The math behind this is simple: a 0 then 1 sequence has the very same probability as a 1 then zero sequence. By always taking the first (or the last) element of this sequence as the output of your new RNG, we get an even chance to get a zero or a one.

秉烛思 2024-08-24 04:50:28

这是一种方法,可能不是最有效的。仔细研究一堆随机数,直到获得 [0..., 1, 0..., 1] 形式的序列(其中 0... 是一个或多个 0)。计算 0 的数量。如果第一个序列较长,则生成 0,如果第二个序列较长,则生成 1。(如果它们相同,请重试。)

这就像 HotBits 从放射性粒子衰变生成随机数一样:

由于任何给定衰减的时间是随机的,因此两次连续衰减之间的间隔也是随机的。然后,我们要做的就是测量一对这样的间隔,并根据两个间隔的相对长度发出零或一位。如果我们测量两次衰减的时间间隔相同,我们将放弃测量并重试

HotBits:如何实现作品

Here's one way, probably not the most efficient. Chew through a bunch of random numbers until you get a sequence of the form [0..., 1, 0..., 1] (where 0... is one or more 0s). Count the number of 0s. If the first sequence is longer, generate a 0, if the second sequence is longer, generate a 1. (If they're the same, try again.)

This is like what HotBits does to generate random numbers from radioactive particle decay:

Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again

HotBits: How It Works

空心↖ 2024-08-24 04:50:28

我只是用一些运行证明来解释已经提出的解决方案。无论我们改变概率多少次,这个解决方案都是无偏的。在头 n 尾抛掷中,连续的 head n tailtail n head 的排他性始终是无偏的。

import random

def biased_toss(probability):
    if random.random() > probability:
        return 1
    else:
        return 0
    
def unbiased_toss(probability):
    x = biased_toss(probability)
    y = biased_toss(probability)
    while x == y:
        x = biased_toss(probability)
        y = biased_toss(probability)
    else:
        return x

# results with contain counts of heads '0' and tails '1'
results = {'0':0, '1':0}
for i in range(1000):
    # on every call we are changing the probability
    p = random.random()
    results[str(unbiased_toss(p))] += 1

# it still return unbiased result
print(results)
    

I'm just explaining the already proposed solutions with some running proof. This solution will be unbiased, no matter how many times we change the probability. In a head n tail toss, the exclusivity of consecutive head n tail or tail n head is always unbiased.

import random

def biased_toss(probability):
    if random.random() > probability:
        return 1
    else:
        return 0
    
def unbiased_toss(probability):
    x = biased_toss(probability)
    y = biased_toss(probability)
    while x == y:
        x = biased_toss(probability)
        y = biased_toss(probability)
    else:
        return x

# results with contain counts of heads '0' and tails '1'
results = {'0':0, '1':0}
for i in range(1000):
    # on every call we are changing the probability
    p = random.random()
    results[str(unbiased_toss(p))] += 1

# it still return unbiased result
print(results)
    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文