中位数选择的最佳中位数 - 3 个元素块与 5 个元素块?
我正在研究一种基于选择算法的快速排序变体实现来进行选择一个好的枢轴元素。传统观点似乎是将数组分为 5 个元素块,取每个元素的中值,然后对所得中值递归应用相同的分块方法以获得“中值的中值”。
让我困惑的是选择 5 元素块而不是 3 元素块。对于 5 元素块,在我看来,您执行 n/4 = n/5 + n/25 + n/125 + n/625 + ... 5 中位数运算,而对于 3 元素块,您执行 n/2 = n/3 + n/9 + n/27 + n/81 + ... 3 中位数运算。由于每个中位数 5 是 6 次比较,每个中位数 3 是 2 次比较,因此会使用中位数 5 和 n 进行
使用中位数 3 进行比较。3*n/2
比较
谁能解释一下这种差异,以及使用 5 元素块的动机是什么?我不熟悉应用这些算法的通常做法,所以也许有某种方法可以削减一些步骤,但仍然“足够接近”中位数以确保良好的枢轴,并且该方法更适用于 5 元素块?
I'm working on a quicksort-variant implementation based on the Select algorithm for choosing a good pivot element. Conventional wisdom seems to be to divide the array into 5-element blocks, take the median of each, and then recursively apply the same blocking approach to the resulting medians to get a "median of medians".
What's confusing me is the choice of 5-element blocks rather than 3-element blocks. With 5-element blocks, it seems to me that you perform n/4 = n/5 + n/25 + n/125 + n/625 + ...
median-of-5 operations, whereas with 3-element blocks, you perform n/2 = n/3 + n/9 + n/27 + n/81 + ...
median-of-3 operations. Being that each median-of-5 is 6 comparisons, and each median-of-3 is 2 comparisons, that results in 3*n/2
comparisons using median-of-5 and n
comparisons using median-of-3.
Can anyone explain this discrepancy, and what the motivation for using 5-element blocks could be? I'm not familiar with usual practices for applying these algorithms, so maybe there's some way you can cut out some steps and still get "close enough" to the median to ensure a good pivot, and that approach works better with 5-element blocks?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
原因是,通过选择 3 个块,我们可能会失去 O(n) 时间算法的保证。
对于 5 块,时间复杂度为
T(n) = T(n/5) + T(7n/10) + O(n)
对于 3 块,时间复杂度为
T(n) = T(n /3) + T(2n/3) + O(n)
看看这个: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf
The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.
For blocks of 5, the time complexity is
T(n) = T(n/5) + T(7n/10) + O(n)
For blocks of 3, it comes out to be
T(n) = T(n/3) + T(2n/3) + O(n)
Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf
我相信这与确保“良好”的分裂有关。划分为 5 个元素块可确保最坏情况下的划分为 70-30。标准论证是这样的:在 n/5 块中,至少一半的中位数 >= 中位数中位数,因此至少有一半的 n/5 块至少有 3 个元素(5 的 1/2)>=中位数,这给出了
3n/10
分割,这意味着另一个分区是最坏情况下>7n/10
。这就给出了 T(n) = T(n/5) + T(7n/10) + O(n) 。
由于 n/5 + 7n/10
1
,最坏情况的运行时间是O(n)。选择 3 元素块使得:至少一半的 n/3 块至少有 2 个元素 >=中位数,因此这给出了 n/3< /code> 分割,或者在最坏的情况下为
2n/3
。这就给出了 T(n) = T(n/3) + T(2n/3) + O(n) 。
在这种情况下,
n/3 + 2n/3 = 1
,因此在最坏的情况下它会减少到O(n log n)。I believe it has to do with assuring a "good" split. Dividing into 5-element blocks assures a worst-case split of 70-30. The standard argument goes like this: of the
n/5
blocks, at least half of the medians are >= the median-of-medians, hence at least half of then/5
blocks have at least 3 elements (1/2 of 5) >= median-of-medians, and this gives a3n/10
split, which means the other partition is7n/10
in the worst case.That gives
T(n) = T(n/5) + T(7n/10) + O(n)
.Since
n/5 + 7n/10 < 1
, the worst-case running time is O(n).Choosing 3-element blocks makes it thus: at least half of the
n/3
blocks have at least 2 elements >= median-of-medians, hence this gives an/3
split, or2n/3
in the worst case.That gives
T(n) = T(n/3) + T(2n/3) + O(n)
.In this case,
n/3 + 2n/3 = 1
, so it reduces to O(n log n) in the worst case.您可以使用大小为 3 的块!是的,我和你一样感到惊讶。 2014 年(你在 2010 年问过)有一篇论文展示了如何做到这一点。
这个想法如下:而不是做
median3
,partition
,median3
,partition
,...,你做median3
、median3
、分区
、median3
、median3
、分区,...。在论文中,这被称为“重复步骤算法”。
因此,而不是:
得到:
所述文章是 Select with Groups of 3 or 4 Takes Linear Time by K . Chen 和 A. Dumitrescu (2014, arxiv),或 以 3 人为一组进行选择或4(2015年,作者主页)。
PS:A. Alexandrescu(D 语言名人!)的 快速确定性选择 展示了如何实现以上甚至更有效。
You can use blocks of size 3! Yes, I'm as surprised as you are. In 2014 (you asked in 2010) there came a paper which shows how to do so.
The idea is as follows: instead of doing
median3
,partition
,median3
,partition
, ..., you domedian3
,median3
,partition
,median3
,median3
,partition
, ... . In the paper this is called "The Repeated Step Algorithm".So instead of:
one gets:
The said article is Select with Groups of 3 or 4 Takes Linear Time by K. Chen and A. Dumitrescu (2014, arxiv), or Select with groups of 3 or 4 (2015, author's homepage).
PS: The Fast Deterministic Selection by A. Alexandrescu (of D language fame!) which shows how to implement the above even more efficiently.