为什么这个洗牌算法没有偏见
我和我的同事正在争论为什么 JS 提示列表中给出的随机播放算法;技巧不会产生像 Jeff Atwood 描述用于简单的洗牌。
提示中的数组洗牌代码是:
list.sort(function() Math.random() - 0.5);
Jeff 的天真的洗牌代码是:
for (int i = 0; i < cards.Length; i++)
{
int n = rand.Next(cards.Length);
Swap(ref cards[i], ref cards[n]);
}
我编写了这个 JS 来测试洗牌:
var list = [1,2,3];
var result = {123:0,132:0,321:0,213:0,231:0,312:0};
function shuffle() { return Math.random() - 0.5; }
for (var i=0; i<60000000; i++) {
result[ list.sort(shuffle).join('') ]++;
}
为此,我得到的结果(来自 Firefox 5)如下:
Order Count %Diff True Avg 123 9997461 -0.0002539 132 10003451 0.0003451 213 10001507 0.0001507 231 9997563 -0.0002437 312 9995658 -0.0004342 321 10004360 0.000436
大概 Array.sort
正在走list
数组并执行(相邻)元素的交换,类似于 Jeff 的示例。那么为什么结果看起来没有偏见呢?
My coworker and I are arguing about why the shuffle algorithm given in this list of JS tips & tricks doesn't produce biased results like the sort Jeff Atwood describes for naive shuffles.
The array shuffle code in the tips is:
list.sort(function() Math.random() - 0.5);
Jeff's naive shuffle code is:
for (int i = 0; i < cards.Length; i++)
{
int n = rand.Next(cards.Length);
Swap(ref cards[i], ref cards[n]);
}
I wrote this JS to test the shuffle:
var list = [1,2,3];
var result = {123:0,132:0,321:0,213:0,231:0,312:0};
function shuffle() { return Math.random() - 0.5; }
for (var i=0; i<60000000; i++) {
result[ list.sort(shuffle).join('') ]++;
}
For which I get results (from Firefox 5) like:
Order Count %Diff True Avg 123 9997461 -0.0002539 132 10003451 0.0003451 213 10001507 0.0001507 231 9997563 -0.0002437 312 9995658 -0.0004342 321 10004360 0.000436
Presumably Array.sort
is walking the list
array and performing swaps of (adjacent) elements similar to Jeff's example. So why don't the results look biased?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我发现它显得公正的原因。
Array.sort() 不仅返回数组,它还改变数组本身。如果我们为每个循环重新初始化数组,我们会得到如下结果:
这显示了非常显着的偏差。
对于那些感兴趣的人,这里是修改后的代码:
I found the reason it appears unbiased.
Array.sort() not only returns the array, it changes the array itself. If we re-initialize the array for each loop, we get results like:
Which shows a very significant bias.
For those interested, here's the modified code:
天真的洗牌的问题是该值可能已经被交换,您可能稍后会再次交换它。假设您有三张牌,并且您真正随机选择一张作为第一张牌。如果您稍后可以随机将该卡与后一张卡交换,那么您就失去了第一个选择的随机性。
如果排序是快速排序,它会不断地将列表分成两半。下一次迭代将每个组随机分为两组。这样一直持续下去,直到只剩下一张牌,然后将它们组合在一起。不同之处在于,您永远不会从第二个随机选择的组中取出一张卡并将其移回第一组。
Knuth-Fisher-Yates 洗牌与朴素洗牌不同,因为您只选择一张牌一次。如果你从一副牌中随机挑选一张牌,你会放回一张牌并再次挑选吗?不,你一次随机拿一张牌。这是我第一次听说它,但我在高中时就从索引 0 开始做过类似的事情。 KFY 可能更快,因为我在随机语句中添加了额外的内容。
不要将其视为交换,而应将其视为从牌组中随机选择牌。对于数组中的每个元素(除了最后一个元素,因为只剩下一张),您从所有剩余卡片中随机选择一张卡片,并将其放下,形成一堆随机洗牌的新卡片。如果您已经进行了任何交换,那么您剩余的牌不再按原始顺序并不重要,您仍然可以从所有剩余的牌中随机挑选一张牌。
随机快速排序就像拿一堆纸牌并将它们随机分成两组,然后将每组纸牌随机分成两个较小的组,如此反复,直到获得单独的纸牌,然后将它们放回一起。
The problem with the naive shuffle is that the value might have already been swapped and you might swap it again later. Let's say you have three cards and you pick one truly at random for the first card. If you later can randomly swap that card with a latter one then you are taking away from the randomness of that first selection.
If the sort is quicksort, it continually splits the list about in half. The next iteration splits each of those groups into two groups randomly. This keeps going on until you are down to single cards, then you combine them all together. The difference is that you never take a card from the second randomly selected group and move it back to the first group.
The Knuth-Fisher-Yates shuffle is different than the naive shuffle because you only pick a card once. If you were picking random cards from a deck, would you put a card back and pick again? No, you take random cards one at a time. This is the first I've heard of it, but I've done something similar back in high school going from index 0 up. KFY is probably faster because I have an extra addition in the random statement.
Don't think of it as swapping, think of it as selecting random cards from a deck. For each element in the array (except the last because there is only one left) you pick a random card out of all the remaining cards and lay it down forming a new stack of cards that are randomly shuffled. It doesn't matter that your remaining cards are no longer in the original order if you've done any swapping already, you are still picking one random card from all the remaining cards.
The random quicksort is like taking a stack of cards and randomly dividing them into two groups, then taking each group and randomly dividing it into two smaller groups, and on and on until you have individual cards then putting them back together.
实际上,这并没有实现他天真的随机排序。他的算法实际上手动转置数组键,而排序则主动对列表进行排序。
排序使用 quicksort 或 插入排序(感谢cwolves指出这一点——见评论)来做到这一点(这将根据实现而变化):
这意味着每次循环迭代,平均情况下的大 O 是 O(n log n),最坏情况下的大 O 是 O(n^2)。
同时,阿特伍德朴素随机排序很简单:
(Knuth-Fisher-Yates 几乎是一样的,只是向后)
所以他的 O(n) 最坏情况有一个大,O(n) 的平均情况有一个大 O。
Actually, that doesn't implement his naïve random sort. His algorithm actually transposes array keys manually, while sort actively sorts a list.
sort uses quicksort or insertion sort (thanks to cwolves for pointing that out -- see comments) to do this (this will vary based on the implementation):
This means that your big O for the average case is O(n log n) and your big O for the worst case is O(n^2) for each loop iteration.
Meanwhile the Atwood naïve random sort is a simple:
(Knuth-Fisher-Yates is almost the same, only backwards)
So his has a big for the worst case of O(n) and a big O for the average case of O(n).