中值选择算法-它是否找到绝对中值或“中值的中值”?接近绝对中位数?
CLRS 第 3 版“最坏情况线性时间中的选择”中的第 9.3 节讨论了用于查找 O 中列表中值的“Select”算法(由于 Blum、Floyd、Pratt、Rivest 和 Tarjan 有时称为 BFPRT 算法) (n) 最坏情况下的时间。当我尝试在白板上运行示例时,我有点困惑。我知道每次调用“Select”时可以消除一定数量的元素(我读过 30% 被消除,而 70% 需要再次检查),我感到困惑的是数组的哪一部分可以被消除消除,即如果将数组可视化为高度为 5 宽度为 n/5 的矩阵,那么消除的元素位于哪个象限?我最初认为它是两个对角相邻的象限,但现在我认为它只是一个象限,具体取决于中位数的中位数(请参阅步骤 5、6 和 7 此处)。
所以我去维基百科看看是否有比 CLRS 分析更少的快速解释(为了在我跳回到 CLRS 进行分析之前理解算法)。我来到这个,特别是“最后,选择了“中位数的中位数”成为支点。”从维基百科的描述来看,“选择”并没有找到真正的中位数,而是找到一个足以为快速排序选择枢轴的中位数的元素。
那么“Select”在真实中位数方面有什么作用,它是如何做到的呢?通过所有这一切,我想到的短语是“部分层次结构”,据我了解,这就是“选择”起作用的原因,但是通过什么逻辑,您可以根据此部分层次结构消除列表中的元素作为中位数?
Section 9.3 in CLRS 3rd edition "Selection in worst-case linear time" talks about the "Select" algorithm (sometimes called the BFPRT algorithm due to Blum, Floyd, Pratt, Rivest, and Tarjan) for finding the median of a list in O(n) time at worst case. I got a little confused as I tried to run a sample on the whiteboard. I understand that a certain number of elements can be eliminated at each call to "Select" (I have read 30% are eliminated versus 70% that need to be checked again), what I was confused about was which part of the array can be eliminated, i.e. if the array is visualized as a matrix with height 5 and width n/5, then which quadrant(s) do the eliminated elements rest in? I originally thought it was two diagonally adjacent quadrants but now I am thinking it is only one quadrant depending on what the median of medians is (see steps 5, 6, and 7 here).
So I went to wikipedia to see if there was a quick explanation with less analysis than CLRS (for the sake of understanding the algorithm before I jumped back to CLRS for the analysis). I came to this, particularly 'Finally, the "median of medians' is chosen to be the pivot." from the sound of the description in wikipedia, "Select" does not find the true median rather a element that is median-enough for the purpose of choosing a pivot for quicksort.
So what does "Select" do in terms of the true median, and how does it do it? The phrase that comes to mind through all of this is "partial hierarchy", which as I understand, is the reason "Select" works, but by what logic can you eliminate elements from the list from being the median based on this partial hierarchy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它找到绝对中位数。
正如您所说,“选择”没有找到真正的中位数,而是找到一个足以为快速排序选择枢轴的中位数的元素。特别是,它的中位数足够大,因此保证会丢弃每次迭代至少占数据集的 30%。不幸的是,这也是一项昂贵的手术。
关键思想是中位数的中位数小于或等于中位数小于或等于它的每 5 个元素中的 3 个。因此,对于 5 个组的一半,它小于或等于每 5 个元素中的 3 个,因此该集合的至少 30% 小于或等于它。所以它位于数据集中最大的 70% 中。
同样,它位于数据集中最小的 70% 中。
这可以保证您避免快速选择的潜在陷阱,即选择具有极值的枢轴点。
如果您希望将效率和最坏情况结合起来,您可以将其与快速选择结合起来。例如,4 轮快速选择,然后是一轮快速选择,然后是 4 轮快速选择,等等。昂贵的 BFPRT 轮保证
O(n)
,而快速选择平均会很快。通过推迟第一轮 BFPRT 直到完成几轮快速选择,您可以使额外的运行时间平均只比快速选择多几个百分点。 (最坏的情况下成本会增加很多,但我们预计不会遇到这种情况。)It finds the absolute median.
As you said, "Select" does not find the true median rather a element that is median-enough for the purpose of choosing a pivot for quicksort. In particular it is median enough that it is guaranteed to drop at least 30% of the dataset on every iteration. Unfortunately it is also an expensive operation.
The key idea is that the median of medians is less than or equal to 3 out of every 5 elements whose median is less than or equal to it. So it is less than or equal to 3 out of every 5 elements for half the groups of 5, so at least 30% of the set is less than or equal to it. So it is in the largest 70% of the data set.
Similarly it is in the smallest 70% of the data set.
This guarantees that you avoid the potential pitfall of quickselect, which is picking pivot points that have extreme values.
If you wish to combine efficiency and a good worst case you can combine this with quickselect. For instance 4 rounds of quickselect followed by one round of this followed by 4 rounds of quickselect, etc. The expensive rounds of BFPRT guarantee
O(n)
while the quickselect on average is going to be fast. By putting off your first round of BFPRT until you've done several rounds of quickselect you can make the extra running time only a few percent more than quickselect on average. (The worst case cost goes up by quite a bit, but we don't expect to encounter that.)