如何使用随机位来模拟公平的 26 面骰子?
如何使用提供位(0 或 1)的随机数生成器来模拟公平的 26 面骰子?我想使用比特流来选择英文字母表中的字母,以便任何一个字母出现的几率与任何其他字母的几率相同(我知道真正的单词不是这样的,并且每个字母都有特定的频率分布)信,但在这里并不重要)。使用二进制 0/1 决策从 AZ 集中公平地挑选字母的最佳方法是什么?我可以想出几种将位映射到字母上的方法,但对我来说,它们不会有偏见并不明显。有已知的好方法吗?
How do I use a random number generator that gives bits (0 or 1) to simulate a fair 26-sided die? I want to use a bitstream to pick letters of the English alphabet such that the odds of any one letter coming up is the same as the odds of any other letter (I know real words aren't like that and have specific frequency distributions for each letter but it doesn't matter here). What's the best way to use binary 0/1 decisions to pick letters fairly from the set A-Z? I can think of a few ways to map bits onto letters but it's not obvious to me that they won't be biased. Is there a known good way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您将自己限制在有限的位数,并且您的骰子有 26 个面,则该方法将始终存在偏差。您必须允许这样的可能性:您必须查看可能无限数量的位,以确保它是公正的。
一个简单的算法是在 0 和
2^n - 1
形式的下一个最大数字(本例中为 31)之间选择一个随机数。如果您随机选择的数字太大,请丢弃它并重新选择,直到获得范围内的数字。显然,这不是最佳算法,因为您“浪费”了一些信息,但对于大多数用途来说它应该足够好了。如果骰子的面数刚好高于
2^m
(对于某些m
),例如:33 面,则最为浪费。在这种情况下,几乎 50% 的情况下您将不得不丢弃该值。If you restrict yourself to a finite number of bits and your die has 26 sides the method will always be biased. You have to allow the possibility that you will have to look at a potentially unlimited number of bits to be sure that it is unbiased.
A simple algorithm is to choose a random number between 0 and the next largest number of the form
2^n - 1
(31 in this case). If the number you randomly pick is too large, discard it and repick until you get a number in range.Clearly this is not an optimal algorithm as you "waste" some information, but it should be good enough for most purposes. It is most wasteful if the number of sides of the die is just above
2^m
for somem
, for example: 33 sides. In this case you will have to discard the value almost 50% of the time.这里的基本答案似乎是正确的 - 如果你的随机数 0..32 大于 25,则重新滚动。然而,您可以通过寻找 26 的倍数来叠加任意长结果的几率,这提供了较小的做多机会。
... 等等。我编写了一个 Python 脚本来计算出最多 32 位的最佳可用位数,并得到了这样的结果:
所以无论哪种方式,如果您使用 13 或 14 位,您都有 1 in 2^12 的机会重新滚动。在这种情况下,您的算法将是:
编辑:出于好奇,我将这些赔率与一些重要值进行了比较,以查看 13 是否确实是最佳数字(假设您可以在相同的情况下生成任意数量的位数,1 到 32)时间量 - 如果你不能,13 位看起来是最好的)。根据我(诚然是昏昏欲睡的)数学计算,如果你能像 16 位那样便宜地获得 32 位,那就选择它吧。否则,赞成13。
The basic answer here seems right - if your random number 0..32 is greater than 25, reroll. However, you can stack the odds against an arbitrarily-long result by looking for a multiple of 26 which provides a smaller chance of going long.
... and so on. I threw together a Python script to figure out the best available number of bits up to 32, for giggles, and got this result:
So either way, you have a 1 in 2^12 chance of rerolling if you use 13 or 14 bits. Your algorithm in this case would be:
EDIT: Out of curiosity, I compared those odds with a few important values, to see if 13 was really the optimal number (assuming you can generate any number of bits, 1 to 32, in the same amount of time - if you can't, 13 bits looks like the best). Based on my (admittedly sleepy) math, if you can get 32 bits as cheaply as 16, go for that instead. Otherwise, favor 13.
对于您的情况,最简单的方法是抛出 5 位,这会给出 32 (0-31) 个等概率结果。如果你得到的值超出了你的范围(大于 25),你会再次尝试(再一次......)
在这种情况下,每个字母的平均“硬币”(位)数将是
(作为参考,请参阅几何分布)
The most simple approach in your case is to throw 5 bits, what gives 32 (0-31) equiprobable outcomes. If you get a value outside your range (greater than 25) you try again (and again...)
The average number of "coins" (bits) to throw in this case for each letter would be
(For reference, see geometric distribution)
一个简单的实现是使用固定位数(例如,4 个字节来获取整数)来组合随机位以获得小数或整数值。将结果除以所提供位数的最大可能值,我认为这应该给你一个均匀分布在 0-1 范围内的小数。 (本质上是一个 rand() 函数)。然后执行 26*rand()
A naive implementation would be to combine the random bits to get a decimal or integer value, using a fixed number of bits (say, 4 bytes to get an integer). Divide the result by the max possible value for the number of bits supplied, which I think should give you a decimal evenly distributed in the range 0-1. (Esentially a rand() function). Then do 26*rand()
26 二进制为 11010。
如果超过 26,则生成 5 位:
或概括它:
生成 (以 2 为底的 log n) + 1 位。如果它们超过 n,则返回值 mod n,或丢弃 &再去吧。
26 is 11010 in binary.
Generate five bits, if they exceed 26, either:
Or generalizing it:
Generate (log n in base 2) + 1 bits. If they exceed n, return the value mod n, or discard & go again.