为什么不总是使用堆排序

发布于 2024-12-18 11:33:23 字数 182 浏览 3 评论 0原文

堆排序排序算法的最坏情况复杂度似乎为 O(nlogn),并且使用 O(1) 空间进行排序操作。

这似乎比大多数排序算法都要好。那么,为什么人们不总是使用堆排序作为排序算法(以及为什么人们使用合并排序或快速排序等排序机制)?

另外,我看到人们在堆排序中使用术语“不稳定”。这意味着什么?

The Heap Sort sorting algorithm seems to have a worst case complexity of O(nlogn), and uses O(1) space for the sorting operation.

This seems better than most sorting algorithms. Then, why wouldn't one use Heap Sort always as a sorting algorithm (and why do folks use sorting mechanisms like Merge sort or Quick sort)?

Also, I have seen people use the term 'instability' with Heap sort. What does that imply?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

时常饿 2024-12-25 11:33:23

稳定排序维持具有相同键的项目的相对顺序。例如,假设您的数据集包含带有员工 ID 和姓名的记录。初始顺序是:

1, Jim
2, George
3, Jim
4, Sally
5, George

您想按名称排序。稳定排序将按以下顺序排列项目:

2, George
5, George
1, Jim
3, Jim
4, Sally

请注意,“George”的重复记录与初始列表中的相对顺序相同。与两张“Jim”唱片相同。

不稳定的排序可能会像这样排列项目:

5, George
2, George
1, Jim
3, Jim
4, Sally

堆排序不稳定,因为堆上的操作可能会改变相等项目的相对顺序。并非所有快速排序实现都是稳定的。这取决于您如何实现分区。

尽管堆排序的最坏情况复杂度为 O(n log(n)),但这并不能说明全部情况。在现实世界的实施中,存在理论分析未考虑到的恒定因素。在堆排序与快速排序的例子中,事实证明有一些方法(例如,中位数为 5)可以使快速排序的最坏情况变得非常罕见。此外,维护堆也不是免费的。

给定一个正态分布的数组,快速排序和堆排序的运行时间都是O(n log(n))。但快速排序会执行得更快,因为它的常数因子比堆排序的常数因子小。简单来说,分区比维护堆更快。

A stable sort maintains the relative order of items that have the same key. For example, imagine your data set contains records with an employee id and a name. The initial order is:

1, Jim
2, George
3, Jim
4, Sally
5, George

You want to sort by name. A stable sort will arrange the items in this order:

2, George
5, George
1, Jim
3, Jim
4, Sally

Note that the duplicate records for "George" are in the same relative order as they were in the initial list. Same with the two "Jim" records.

An unstable sort might arrange the items like this:

5, George
2, George
1, Jim
3, Jim
4, Sally

Heapsort is not stable because operations on the heap can change the relative order of equal items. Not all Quicksort implementations are stable. It depends on how you implement the partitioning.

Although Heapsort has a worst case complexity of O(n log(n)), that doesn't tell the whole story. In real-world implementation, there are constant factors that the theoretical analysis doesn't take into account. In the case of Heapsort vs. Quicksort, it turns out that there are ways (median of 5, for example) to make Quicksort's worst cases very rare indeed. Also, maintaining a heap is not free.

Given an array with a normal distribution, Quicksort and Heapsort will both run in O(n log(n)). But Quicksort will execute faster because its constant factors are smaller than the constant factors for Heapsort. To put it simply, partitioning is faster than maintaining the heap.

爱已欠费 2024-12-25 11:33:23

堆排序的最坏情况复杂度为O(n log(n))。然而实证研究表明,通常快速排序(和其他排序算法)比堆排序快得多,尽管其最坏情况复杂度为O(n²)http://www.cs.auckland.ac.nz/~jmor159/PLDS210/qsort3.html

另外,来自 快速排序文章

快速排序最直接的竞争对手是堆排序。堆排序的最坏情况运行时间始终为 O(n log n)。但是,堆排序被认为平均比标准就地快速排序慢一些。这仍在争论和研究中,一些出版物表明相反的观点。[13][14] Introsort 是快速排序的一种变体,当检测到不良情况时,它会切换到堆排序,以避免快速排序的最坏情况运行时间。如果事先知道需要进行堆排序,那么直接使用它会比等待 introsort 切换到它更快。

但是,快速排序绝对不应该用在需要保证响应时间的应用程序中!

Stackoverflow 上的来源:快速排序与堆排序

The Heap Sort has a worst case complexity of O(n log(n)). Yet empirical studies show that generally Quick Sort (and other sorting algorithms) is considerably faster than heap sort, although its worst case complexity is O(n²) : http://www.cs.auckland.ac.nz/~jmor159/PLDS210/qsort3.html

Also, from the quick sort article on Wikipedia:

The most direct competitor of quicksort is heapsort. Heapsort's worst-case running time is always O(n log n). But, heapsort is assumed to be on average somewhat slower than standard in-place quicksort. This is still debated and in research, with some publications indicating the opposite.[13][14] Introsort is a variant of quicksort that switches to heapsort when a bad case is detected to avoid quicksort's worst-case running time. If it is known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it.

However, quick sort should never be used in applications which require a guarantee of response time!

Source on Stackoverflow: Quicksort vs heapsort

迷你仙 2024-12-25 11:33:23

没有什么灵丹妙药……

只是提一下我在这里还没有看到的另一个论点:

如果你的数据集真的很大并且不适合内存,那么合并排序就像一个魅力。它经常用于数据集可以跨越数百台机器的集群中。

There is no silver bullet...

Just to mention another argument I haven't seen here yet:

If your dataset is really huge and doesn't fit into memory, then merge sort works like a charm. It's frequently used in clusters where dataset can span over hundreds of machines.

风吹雨成花 2024-12-25 11:33:23

稳定的排序算法维护具有相同键的记录的相对顺序

有些应用程序喜欢具有这种稳定性,但大多数应用程序并不关心,例如Google是您的朋友。

至于您断言“人们使用合并排序或快速排序等排序机制”,我敢打赌大多数人都使用他们语言中内置的任何内容,并且不会过多考虑排序算法。那些自己动手的人可能没有听说过堆排序(最后是个人经验)。

最后也是最大的原因是并不是每个人都想要排序堆。有些人想要排序的列表。如果普通程序员乔的老板说“对这个列表进行排序”,乔说“这是你从未听说过的堆数据结构,老板!”,乔的下一次绩效评估不会那么好。

Stable sorting algorithms maintain the relative order of records with equal keys

Some applications like having that kind of stability, most don't care, for examples Google is your friend.

As for you assertion that "folks use sorting mechanisms like Merge sort or Quick sort" I would bet that most folks use whatever is built into their language and don't think about the sorting algorithm all that much. Those that roll their own have probably not heard of heap sort (the last is personal experience).

The last and biggest reason is that not everyone is going to want a sorted heap. Some people want the sorted list. If average Joe Programmer's boss says "sort this list", and Joe says "Here's this heap data structure you've never heard of, boss!", Joe's next performance review is not going to be so great.

三生池水覆流年 2024-12-25 11:33:23

当我在 80 年代中期在 Tandem Non-Stop 计算机上工作一段时间时,我注意到系统核心排序例程是 HeapSort,正是因为它提供了有保证的 NlogN 性能。不过,我不知道有人有任何理由使用它,所以我不知道它在实践中是如何运作的。我喜欢堆排序,但除了上面提到的缺点之外,我还听说它对现代内存的利用很差,因为它使内存访问遍及各处,而快速排序甚至小基数排序最终会混合相对较小的数字顺序读取和写入的流 - 因此缓存更有效。

When I worked for a short time on Tandem Non-Stop computers in the mid-80s I noted that the system in-core sort routine was HeapSort, precisely because it gave guaranteed NlogN performance. I don't know of anybody who had any reason to use it, though, so I don't know how it worked in practice. I like heapsort, but as well as the drawbacks noted above I have heard it said that it makes poor use of modern memories, because it makes memory accesses all over the place, whereas quicksort and even small radix sorts end up intermixing a relatively small number of streams of sequential reads and writes - so caches are more effective.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文