快速排序优于堆排序
堆排序的最坏情况复杂度为 O(nlogn)
,而快速排序的复杂度为 O(n^2)
。 但经验证据表明快速排序更优越。这是为什么?
Heap Sort has a worst case complexity of O(nlogn)
while Quicksort has O(n^2)
.
But emperical evidences say quicksort is superior. Why is that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
主要因素之一是快速排序具有更好的引用局部性——下一个要访问的内容通常在内存中与您刚刚查看的内容很接近。相比之下,堆排序的跳跃次数要多得多。由于靠近的事物可能会被缓存在一起,因此快速排序往往会更快。
然而,快速排序的最坏情况性能明显比堆排序差。由于某些关键应用程序需要保证速度性能,因此堆排序是处理此类情况的正确方法。
One of the major factors is that quicksort has better locality of reference -- the next thing to be accessed is usually close in memory to the thing you just looked at. By contrast, heapsort jumps around significantly more. Since things that are close together will likely be cached together, quicksort tends to be faster.
However, quicksort's worst-case performance is significantly worse than heapsort's is. Because some critical applications will require guarantees of speed performance, heapsort is the right way to go for such cases.
堆排序是 O(N log N) 保证的,这比快速排序中最坏的情况要好得多。堆排序不需要更多内存来容纳另一个数组来放置合并排序所需的有序数据。那么为什么商业应用程序坚持使用快速排序呢?与其他实现相比,快速排序有何特别之处?
我自己测试了这些算法,发现快速排序确实有一些特别之处。它运行速度很快,比Heap和Merge算法快得多。
快速排序的秘密是:它几乎不进行不必要的元素交换。交换很费时间。
使用堆排序,即使所有数据都已排序,您也将交换 100% 的元素来对数组进行排序。
如果使用合并排序,情况会更糟。您将把 100% 的元素写入另一个数组中,然后将其写回到原始数组中,即使数据已经排序。
使用快速排序,您不会交换已经订购的内容。如果你的数据是完全有序的,那么你几乎不需要交换任何东西!尽管对于最坏的情况有很多争论,但在主元的选择上稍加改进,除了获取数组的第一个或最后一个元素之外,都可以避免这种情况。如果从第一个、最后一个和中间元素之间的中间元素得到一个主元,就足以避免最坏的情况。
Quicksort的优越之处不是最坏的情况,而是最好的情况!在最好的情况下,你会进行相同数量的比较,好吧,但你几乎什么也不交换。一般情况下,您会交换部分元素,但不是全部元素,如堆排序和合并排序。这就是快速排序的最佳时机。更少的交换,更快的速度。
下面在我的计算机上用 C# 实现,在发布模式下运行,使用中间枢轴比 Array.Sort 快 3 秒,使用改进枢轴比 Array.Sort 快 2 秒(是的,获得良好枢轴需要一定的开销)。
Heapsort is O(N log N) guaranted, what is much better than worst case in Quicksort. Heapsort don't need more memory for another array to putting ordered data as is needed by Mergesort. So why do comercial applications stick with Quicksort? What Quicksort has that is so special over others implementations?
I've tested the algorithms myself and I've seen that Quicksort has something special indeed. It runs fast, much faster than Heap and Merge algorithms.
The secret of Quicksort is: It almost doesn't do unnecessary element swaps. Swap is time consuming.
With Heapsort, even if all of your data is already ordered, you are going to swap 100% of elements to order the array.
With Mergesort, it's even worse. You are going to write 100% of elements in another array and write it back in the original one, even if data is already ordered.
With Quicksort you don't swap what is already ordered. If your data is completely ordered, you swap almost nothing! Although there is a lot of fussing about worst case, a little improvement on the choice of pivot, any other than getting the first or last element of array, can avoid it. If you get a pivot from the intermediate element between first, last and middle element, it is suficient to avoid worst case.
What is superior in Quicksort is not the worst case, but the best case! In best case you do the same number of comparisons, ok, but you swap almost nothing. In average case you swap part of the elements, but not all elements, as in Heapsort and Mergesort. That is what gives Quicksort the best time. Less swap, more speed.
The implementation below in C# on my computer, running on release mode, beats Array.Sort by 3 seconds with middle pivot and by 2 seconds with improved pivot (yes, there is an overhead to get a good pivot).
这里有一些解释:
http://www.cs.auckland.ac。 nz/software/AlgAnim/qsort3.html
http://users .aims.ac.za/~mackay/sorting/sorting.html
本质上,即使快速排序的最坏情况是 O(n^2),它的平均性能也会更好。 :-)
Here's a couple explanations:
http://www.cs.auckland.ac.nz/software/AlgAnim/qsort3.html
http://users.aims.ac.za/~mackay/sorting/sorting.html
Essentially, even though the worst case for quick sort is O(n^2) it on average will perform better. :-)
大 O 表示法意味着对 n 个项目进行排序所需的时间受函数
c*n*log(n)
限制,其中c
是一些未指定的常数因子。对于quicksort
和heapsort
来说,常量c
没有理由应该相同。所以真正的问题是:为什么你会期望它们同样快?在实践中,
Quicksort
始终比heapsort
快一些,但最近差异变得更大,因为如前所述,内存访问的局部性对于执行速度变得如此重要。The big-O notation means that the time required to sort n items is bounded above by the function
c*n*log(n)
, wherec
is some unspecified constant factor. There is no reason why the constantc
should be the same forquicksort
andheapsort
. So the real question is: why would you expect them to be equally fast?Quicksort
has always been somewhat faster thanheapsort
in practice, but the difference has become larger recently since, as mentioned before, locality of memory access has become so important to execution speed.平均情况的复杂性,以及您可以采取简单的步骤来最大限度地降低快速排序中最坏情况复杂性的风险的事实(例如,选择主元作为三个元素的中位数,而不是单个选定位置)。
Average-case complexity, and the fact that you can take simple steps to minimize the risk of worst-case complexity in Quicksort (e.g. select the pivot as a median of three elements rather than a single selected position).
正如已经说过的,与堆排序相比,快速排序具有更好的引用局部性,但最坏情况的复杂度为 O(n^2)。
std::sort 是使用内省排序实现的:它大部分时间都运行快速排序,但在它检测到由于主元选择错误而导致运行时会很糟糕的情况下,它会切换到堆排序。在这种情况下,您将获得有保证的 O(nlog(n)) 复杂度以及几乎每次都会选择的快速排序的速度。
As already said, quicksort has much better locality of reference compared to heapsort, but the worst case has a O(n^2) complexity.
std::sort is implemented using introspection sort: it runs quicksort most of the time, but it case it detects that the runtime will be bad because of the bad pivot selection, it switches to heap sort. In that case you get a guaranteed O(nlog(n)) complexity together with the speed of quicksort, which is picked almost every time.