我正在帮助某人解决他的 JavaScript 代码,我的眼睛被一个看起来像这样的部分吸引了:
function randOrd(){
return (Math.round(Math.random())-0.5);
}
coords.sort(randOrd);
alert(coords);
我的第一个想法是:嘿,这不可能工作!但后来我做了一些实验,发现它确实至少看起来提供了很好的随机结果。
然后我做了一些网络搜索,几乎在顶部找到了一篇文章,其中包含此代码最彻底地复制。 看起来像一个非常受人尊敬的网站和作者......
但我的直觉告诉我,这一定是错误的。 特别是 ECMA 标准没有指定排序算法。 我认为不同的排序算法会导致不同的非均匀洗牌。 有些排序算法甚至可能无限循环......
但你觉得呢?
另一个问题是……我现在该如何测量这种洗牌技术结果的随机性?
更新:我做了一些测量,并将下面的结果发布为答案之一。
I was helping somebody out with his JavaScript code and my eyes were caught by a section that looked like that:
function randOrd(){
return (Math.round(Math.random())-0.5);
}
coords.sort(randOrd);
alert(coords);
My first though was: hey, this can't possibly work! But then I did some experimenting and found that it indeed at least seems to provide nicely randomized results.
Then I did some web search and almost at the top found an article from which this code was most ceartanly copied. Looked like a pretty respectable site and author...
But my gut feeling tells me, that this must be wrong. Especially as the sorting algorithm is not specified by ECMA standard. I think different sorting algoritms will result in different non-uniform shuffles. Some sorting algorithms may probably even loop infinitely...
But what do you think?
And as another question... how would I now go and measure how random the results of this shuffling technique are?
update: I did some measurements and posted the results below as one of the answers.
发布评论
评论(13)
在乔恩已经涵盖理论,这是一个实现:
算法是
O(n)
,而排序应该是O(n log n)
。 根据与原生sort()
函数相比执行 JS 代码的开销,这可能会导致 性能存在显着差异,并且随着数组大小的增加而增加。在对 bobobobo 的答案的评论中,我说过所讨论的算法可能不会产生均匀分布的概率(取决于
sort()
的实现)。我的论点是这样的:排序算法需要一定数量的
c
比较,例如冒泡排序的c = n(n-1)/2
。 我们的随机比较函数使每次比较的结果具有相同的可能性,即存在2^c
同样可能的结果。 现在,每个结果必须对应于数组条目的n!
排列之一,这使得在一般情况下不可能均匀分布。 (这是一种简化,因为所需的实际比较次数取决于输入数组,但断言应该仍然成立。)正如 Jon 指出的那样,仅凭这一点并没有理由选择 Fisher-Yates 而不是使用
sort()
,因为随机数生成器还将有限数量的伪随机值映射到n!
排列。 但 Fisher-Yates 的结果应该仍然更好:Math.random()
生成[0;1[
] 范围内的伪随机数。 由于 JS 使用双精度浮点值,这对应于2^x
可能的值,其中52 ≤ x ≤ 63
(我懒得找到实际数字) 。 如果原子事件的数量具有相同的数量级,则使用 Math.random() 生成的概率分布将不再表现良好。使用 Fisher-Yates 时,相关参数是数组的大小,由于实际限制,数组的大小永远不应接近
2^52
。当使用随机比较函数排序时,该函数基本上只关心返回值是正数还是负数,所以这永远不会成为问题。 但还有一个类似的情况:由于比较函数表现良好,因此如上所述,
2^c
可能的结果是同等可能的。 如果c ~ n log n
then2^c ~ n^(a·n)
wherea = const
,这至少是可能的2^c
与n!
的量级相同(甚至小于),因此导致分布不均匀,即使排序算法映射到排列上也是如此均匀。 这是否有任何实际影响超出了我的范围。真正的问题是排序算法不能保证均匀地映射到排列上。 很容易看出合并排序是对称的,但对冒泡排序,或更重要的是快速排序或堆排序之类的推理却不是。
底线:只要
sort()
使用合并排序,除了极端情况外,您应该相当安全(至少我希望2^c ≤ n!
是一种极端情况),如果不是,则一切皆废。After Jon has already covered the theory, here's an implementation:
The algorithm is
O(n)
, whereas sorting should beO(n log n)
. Depending on the overhead of executing JS code compared to the nativesort()
function, this might lead to a noticable difference in performance which should increase with array sizes.In the comments to bobobobo's answer, I stated that the algorithm in question might not produce evenly distributed probabilities (depending on the implementation of
sort()
).My argument goes along these lines: A sorting algorithm requires a certain number
c
of comparisons, egc = n(n-1)/2
for Bubblesort. Our random comparison function makes the outcome of each comparison equally likely, ie there are2^c
equally probable results. Now, each result has to correspond to one of then!
permutations of the array's entries, which makes an even distribution impossible in the general case. (This is a simplification, as the actual number of comparisons neeeded depends on the input array, but the assertion should still hold.)As Jon pointed out, this alone is no reason to prefer Fisher-Yates over using
sort()
, as the random number generator will also map a finite number of pseudo-random values to then!
permutations. But the results of Fisher-Yates should still be better:Math.random()
produces a pseudo-random number in the range[0;1[
. As JS uses double-precision floating point values, this corresponds to2^x
possible values where52 ≤ x ≤ 63
(I'm too lazy to find the actual number). A probability distribution generated usingMath.random()
will stop behaving well if the number of atomic events is of the same order of magnitude.When using Fisher-Yates, the relevant parameter is the size of the array, which should never approach
2^52
due to practical limitations.When sorting with a random comparision function, the function basically only cares if the return value is positive or negative, so this will never be a problem. But there is a similar one: Because the comparison function is well-behaved, the
2^c
possible results are, as stated, equally probable. Ifc ~ n log n
then2^c ~ n^(a·n)
wherea = const
, which makes it at least possible that2^c
is of same magnitude as (or even less than)n!
and thus leading to an uneven distribution, even if the sorting algorithm where to map onto the permutaions evenly. If this has any practical impact is beyond me.The real problem is that the sorting algorithms are not guaranteed to map onto the permutations evenly. It's easy to see that Mergesort does as it's symmetric, but reasoning about something like Bubblesort or, more importantly, Quicksort or Heapsort, is not.
The bottom line: As long as
sort()
uses Mergesort, you should be reasonably safe except in corner cases (at least I'm hoping that2^c ≤ n!
is a corner case), if not, all bets are off.这从来都不是我最喜欢的洗牌方式,部分原因是正如您所说,它是特定于实现的。 特别是,我似乎记得从 Java 或 .NET(不确定是哪一个)排序的标准库通常可以检测到某些元素之间的比较是否不一致(例如,您首先声明
A < B< /code> 和
B,然后是
C )。
它最终也会成为比您真正需要的更复杂的洗牌(就执行时间而言)。
我更喜欢洗牌算法,它有效地将集合划分为“洗牌”(在集合开始时,最初为空)和“未洗牌”(集合的其余部分)。 在算法的每个步骤中,选择一个随机的未洗牌元素(可能是第一个)并将其与第一个未洗牌元素交换 - 然后将其视为已洗牌(即在心里移动分区以包含它)。
这是 O(n),只需要 n-1 次调用随机数生成器,这很好。 它还会产生真正的洗牌 - 任何元素都有 1/n 的机会出现在每个空间中,无论其原始位置如何(假设合理的 RNG)。 排序后的版本近似均匀分布(假设随机数生成器不会两次选择相同的值,如果它返回随机双精度数,则这种情况极不可能),但我发现更容易推理随机播放版本:)
这种方法称为 Fisher-Yates 随机播放。
我认为最好的做法是编写一次洗牌代码,然后在需要洗牌项目的任何地方重复使用它。 那么您无需担心排序实现的可靠性或复杂性。 这只是几行代码(我不会在 JavaScript 中尝试!)
维基百科文章shuffle(尤其是 shuffle 算法部分)讨论了对随机投影进行排序 - 一般而言,值得阅读有关 shuffle 的不良实现的部分,这样您就知道应该避免什么。
It's never been my favourite way of shuffling, partly because it is implementation-specific as you say. In particular, I seem to remember that the standard library sorting from either Java or .NET (not sure which) can often detect if you end up with an inconsistent comparison between some elements (e.g. you first claim
A < B
andB < C
, but thenC < A
).It also ends up as a more complex (in terms of execution time) shuffle than you really need.
I prefer the shuffle algorithm which effectively partitions the collection into "shuffled" (at the start of the collection, initially empty) and "unshuffled" (the rest of the collection). At each step of the algorithm, pick a random unshuffled element (which could be the first one) and swap it with the first unshuffled element - then treat it as shuffled (i.e. mentally move the partition to include it).
This is O(n) and only requires n-1 calls to the random number generator, which is nice. It also produces a genuine shuffle - any element has a 1/n chance of ending up in each space, regardless of its original position (assuming a reasonable RNG). The sorted version approximates to an even distribution (assuming that the random number generator doesn't pick the same value twice, which is highly unlikely if it's returning random doubles) but I find it easier to reason about the shuffle version :)
This approach is called a Fisher-Yates shuffle.
I would regard it as a best practice to code up this shuffle once and reuse it everywhere you need to shuffle items. Then you don't need to worry about sort implementations in terms of reliability or complexity. It's only a few lines of code (which I won't attempt in JavaScript!)
The Wikipedia article on shuffling (and in particular the shuffle algorithms section) talks about sorting a random projection - it's worth reading the section on poor implementations of shuffling in general, so you know what to avoid.
我对这种随机排序的结果的随机性做了一些测量...
我的技术是采用一个小数组 [1,2,3,4] 并创建它的所有 (4!= 24) 排列。 然后,我将多次对数组应用改组函数,并计算每个排列生成的次数。 一个好的洗牌算法会将结果相当均匀地分布在所有排列中,而一个坏的洗牌算法不会产生统一的结果。
使用下面的代码我在 Firefox、Opera、Chrome、IE6/7/8 中进行了测试。
令我惊讶的是,随机排序和真正的洗牌都创建了同样均匀的分布。 因此,似乎(正如许多人所建议的那样)主要浏览器正在使用合并排序。 这当然并不意味着不可能有浏览器有不同的做法,但我想说这意味着这种随机排序方法足够可靠,可以在实践中使用。<强>编辑:这个测试并没有真正正确地测量随机性或缺乏随机性。 请参阅我发布的另一个答案。
但在性能方面,Cristoph 给出的 shuffle 函数显然是赢家。即使对于小型四元素数组,真正的 shuffle 的执行速度也大约是随机排序的两倍!
I did some measurements of how random the results of this random sort are...
My technique was to take a small array [1,2,3,4] and create all (4! = 24) permutations of it. Then I would apply the shuffling function to the array a large number of times and count how many times each permutation is generated. A good shuffling algoritm would distribute the results quite evenly over all the permutations, while a bad one would not create that uniform result.
Using the code below I tested in Firefox, Opera, Chrome, IE6/7/8.
Surprisingly for me, the random sort and the real shuffle both created equally uniform distributions. So it seems that (as many have suggested) the main browsers are using merge sort. This of course doesn't mean, that there can't be a browser out there, that does differently, but I would say it means, that this random-sort-method is reliable enough to use in practice.EDIT: This test didn't really measured correctly the randomness or lack thereof. See the other answer I posted.
But on the performance side the shuffle function given by Cristoph was a clear winner. Even for small four-element arrays the real shuffle performed about twice as fast as random-sort!
有趣的是,微软在他们的选择随机浏览器页面中使用了相同的技术。
他们使用了略有不同的比较函数:
对我来说看起来几乎相同,但是 结果并不是那么随机...
因此,我使用链接文章中使用的相同方法再次进行了一些测试运行,事实上 - 事实证明随机 -排序方法产生了有缺陷的结果。 新的测试代码在这里:
Interestingly, Microsoft used the same technique in their pick-random-browser-page.
They used a slightly different comparison function:
Looks almost the same to me, but it turned out to be not so random...
So I made some testruns again with the same methodology used in the linked article, and indeed - turned out that the random-sorting-method produced flawed results. New test code here:
我在我的网站上放置了一个简单的测试页面,显示您当前的浏览器与其他流行浏览器的偏差使用不同的方法进行洗牌。 它显示了仅使用 Math.random()-0.5(另一种没有偏见的“随机”洗牌)以及上面提到的 Fisher-Yates 方法的可怕偏见。
您可以看到,在某些浏览器上,某些元素在“随机播放”期间根本不会改变位置的可能性高达 50%!
注意:您可以通过将代码更改为:
测试结果: http://jsperf.com/optimized-fisher-yates
I have placed a simple test page on my website showing the bias of your current browser versus other popular browsers using different methods to shuffle. It shows the terrible bias of just using
Math.random()-0.5
, another 'random' shuffle that isn't biased, and the Fisher-Yates method mentioned above.You can see that on some browsers there is as high as a 50% chance that certain elements will not change place at all during the 'shuffle'!
Note: you can make the implementation of the Fisher-Yates shuffle by @Christoph slightly faster for Safari by changing the code to:
Test results: http://jsperf.com/optimized-fisher-yates
我认为对于您对分发不挑剔并且希望源代码较小的情况来说,这很好。
在 JavaScript 中(不断传输源代码),小的影响会导致带宽成本的差异。
I think it's fine for cases where you're not picky about distribution and you want the source code to be small.
In JavaScript (where the source is transmitted constantly), small makes a difference in bandwidth costs.
不,这是不正确的。 正如其他答案所指出的,这将导致不均匀的随机播放,并且随机播放的质量还将取决于浏览器使用的排序算法。
现在,这对您来说可能听起来不太糟糕,因为即使理论上分布不均匀,但实际上它可能几乎均匀,对吗? 嗯,不,甚至还差得远。 下图分别显示了 Chrome 和 Firefox 中每个元素被打乱到的索引的热图:如果像素 (i, j) 为绿色,则表示索引 i 处的元素经常被洗牌到索引 j,如果它是红色的,那么它在那里洗牌的次数太少了。
这些屏幕截图取自 Mike Bostock 关于此主题的页面。
正如您所看到的,使用随机比较器进行洗牌在 Chrome 中存在严重偏差,在 Firefox 中更是如此。 特别是,两者都沿着对角线有很多绿色,这意味着太多的元素被“洗牌”到非常接近它们在原始序列中的位置的地方。 相比之下,无偏洗牌的类似图表(例如使用 Fisher-Yates 算法)将全部是浅黄色,只有少量随机噪声。
No, it is not correct. As other answers have noted, it will lead to a non-uniform shuffle and the quality of the shuffle will also depend on which sorting algorithm the browser uses.
Now, that might not sound too bad to you, because even if theoretically the distribution is not uniform, in practice it's probably nearly uniform, right? Well, no, not even close. The following charts show heat-maps of which indices each element gets shuffled to, in Chrome and Firefox respectively: if the pixel (i, j) is green, it means the element at index i gets shuffled to index j too often, and if it's red then it gets shuffled there too rarely.
These screenshots are taken from Mike Bostock's page on this subject.
As you can see, shuffling using a random comparator is severely biased in Chrome and even more so in Firefox. In particular, both have a lot of green along the diagonal, meaning that too many elements get "shuffled" somewhere very close to where they were in the original sequence. In comparison, a similar chart for an unbiased shuffle (e.g. using the Fisher-Yates algorithm) would be all pale yellow with just a small amount of random noise.
四年过去了,但我想指出的是,无论你使用什么排序算法,随机比较器方法都不会正确分布。
证明:
n
个元素的数组,恰好有n!
个排列(即可能的随机排列)。唯一可能正确分布的大小是 n=0,1,2。
作为练习,尝试绘制 n=3 时不同排序算法的决策树。
证明中存在一个差距:如果排序算法依赖于比较器的一致性,并且在不一致的比较器下具有无限的运行时间,则它可以具有无限的概率之和,即使这样,也允许加起来为 1/6总和中的每个分母都是 2 的幂。尝试找出一个。
此外,如果比较器有固定的机会给出任一答案(例如,对于常数
P
,(Math.random() < P)*2 - 1
),则上述证明成立。 如果比较器根据之前的答案改变其赔率,则可能会产生公平的结果。 为给定的排序算法找到这样的比较器可能是一篇研究论文。It's been four years, but I'd like to point out that the random comparator method won't be correctly distributed, no matter what sorting algorithm you use.
Proof:
n
elements, there are exactlyn!
permutations (i.e. possible shuffles).The only sizes that could possibly be correctly distributed are n=0,1,2.
As an exercise, try drawing out the decision tree of different sort algorithms for n=3.
There is a gap in the proof: If a sort algorithm depends on the consistency of the comparator, and has unbounded runtime with an inconsistent comparator, it can have an infinite sum of probabilities, which is allowed to add up to 1/6 even if every denominator in the sum is a power of 2. Try to find one.
Also, if a comparator has a fixed chance of giving either answer (e.g.
(Math.random() < P)*2 - 1
, for constantP
), the above proof holds. If the comparator instead changes its odds based on previous answers, it may be possible to generate fair results. Finding such a comparator for a given sorting algorithm could be a research paper.这当然是一种黑客行为。 实际上,无限循环算法是不可能的。
如果你正在对对象进行排序,你可以循环遍历 coords 数组并执行类似的操作:(
然后再次循环遍历它们以删除 sortValue)
不过仍然是一个 hack。 如果你想做得好,你就必须用困难的方式来做:)
It is a hack, certainly. In practice, an infinitely looping algorithm is not likely.
If you're sorting objects, you could loop through the coords array and do something like:
(and then loop through them again to remove the sortValue)
Still a hack though. If you want to do it nicely, you have to do it the hard way :)
如果您使用 D3,则有一个内置的随机播放功能(使用 Fisher-Yates):
Mike 对此进行了详细介绍:
If you're using D3 there is a built-in shuffle function (using Fisher-Yates):
And here is Mike going into details about it:
http://bost.ocks.org/mike/shuffle/
这是使用单个数组的方法:
基本逻辑是:
代码:
Here's an approach that uses a single array:
The basic logic is:
Code:
您可以使用
Array.sort ()
函数来打乱数组 - 是。结果是否足够随机 - 否。
考虑以下代码片段:
示例输出:
理想情况下,计数应均匀分布(对于上面的示例,所有计数应约为 20)。 但他们不是。 显然,分布取决于浏览器实现的排序算法以及它如何迭代数组项进行排序。
Can you use the
Array.sort()
function to shuffle an array – Yes.Are the results random enough – No.
Consider the following code snippet:
Sample output:
Ideally, the counts should be evenly distributed (for the above example, all counts should be around 20). But they are not. Apparently, the distribution depends on what sorting algorithm is implemented by the browser and how it iterates the array items for sorting.
没有什么问题。
您传递给 .sort() 的函数通常看起来像
您在 sortingFunc 中的工作是返回:
上面的排序函数将事物按顺序排列。
如果您随机返回 - 和 + ,您将得到随机排序。
就像在 MySQL 中一样:
There is nothing wrong with it.
The function you pass to .sort() usually looks something like
Your job in sortingFunc is to return:
The above sorting function puts things in order.
If you return -'s and +'s randomly as what you have, you get a random ordering.
Like in MySQL: