快速排序与堆排序

残花月 2024-09-01 18:05:56

处理非常大的输入时，堆排序是一个安全的选择。渐近分析表明，最坏情况下堆排序的增长顺序是 Big-O(n logn)，这比 Quicksort 的 Big-O(n^2) 更好最坏的情况。然而，在大多数机器上，堆排序在实践中比实现良好的快速排序要慢一些。堆排序也不是一种稳定的排序算法。

堆排序在实践中比快速排序慢的原因是由于引用的位置更好（“https://en. wikipedia.org/wiki/Locality_of_reference"）在快速排序中，数据元素位于相对较近的存储位置。表现出很强的引用局部性的系统是性能优化的最佳候选者。然而，堆排序处理更大的跳跃。这使得快速排序对于较小的输入更有利。

回复收藏 0 原文

疑心病 2024-09-01 18:05:56

简单来说>>与 QuickSort 不同，HeapSort 保证在最坏情况下运行时间为“O(n log n)”
“O(n log n)”的〜平均〜运行时间。 QuickSort 通常在实践中使用，因为通常它速度更快，但是
当您需要对不适合内存的大文件进行排序时，HeapSort 用于外部排序
电脑。

回复收藏 0 原文

铃予 2024-09-01 18:05:56

实际上，当您主要关心速度时，几乎没有什么比快速排序更好的了，除非您可以使用 RadixSort（RadixSort 通常仅在对数字进行排序时才可能）。 QuickSort 通常以相当大的优势击败 HeapSort 和 MergeSort。

如果不断增加要排序的元素数量，则会出现一个收支平衡点，HeapSort 和 MergeSort 将超过 QuickSort，因为它们始终为 O(n * log2(n))，而 QuickSort 则为通常仅在O(n * log2(n))范围内，并且它的实际执行方式在很大程度上取决于要排序的数据和实现细节。因此，即使 QuickSort 最初速度更快，但随着元素数量的增加，其速度下降得比 HeapSort 或 MergeSort 更快，这意味着在某些时候两者都会超过 QuickSort。然而，实际上，通常要排序的数据集永远不会变得足够大，甚至无法达到收支平衡点，或者甚至无法变得足够大，因为在此之前您的系统将耗尽资源。

只有少数不幸的情况下，快速排序可能会严重失败，具体取决于您的实现，例如，当初始数据集已经排序或反向排序或遵循另一种奇怪的排序模式时（例如，每对元素都反向排序，所有元素对都排序）。然而，通过使 QuickSort 实现更加健壮，可以避免这些问题。典型的优化包括：

不是选择始终作为中心元素的枢轴元素，而是随机选择它
不是只选择一个枢轴元素，而是选择三个（从固定位置或随机），然后取中间的一个
而不是分区分成两个子数组，划分为三个（小于 Pivot、大于 Pivot、等于 Pivot；通常相等的元素要么保存在较小的子数组中，要么保存在较大的子数组中）。
一旦要排序的子数组的大小低于某个阈值（4-16 左右的数字很流行），请通过替代算法对该子数组进行排序（插入排序对于此任务非常流行，但几乎任何回退都可以）；这是为了避免在辅助数组中需要太多递归或太多空间）
您也可以在低于某个阈值时完全停止排序，然后对整个数据运行替代算法，而不是切换到低于某个阈值的不同算法现在将受益于几乎已排序的数据集（这仅在您选择正确的第二个算法时才有效）
在应用快速排序之前对数据集进行洗牌（这看起来很愚蠢，但对完全未排序的数据进行洗牌不会使快速排序变慢，但如果数据恰好经过预排序，它可以避免出现问题）

所有这些更改都会使 QuickSort 变慢，有时会慢一点，有时会慢一点，但它们也可以防止 QuickSort 变得慢得难以忍受的情况，否则影响很小如果您的数据或多或少是随机的，但如果不是，则会产生巨大的积极影响。

In practice, barely anything ever beats QuickSort when speed is your main concern, unless you can use RadixSort (RadixSort is usually only possible when sorting numbers). QuickSort typically beats HeapSort and MergeSort by a fair margin.

If you keep increasing the number of elements to be sorted, there is a break even point where HeapSort and MergeSort will overtake QuickSort, as they are always O(n * log2(n)), whereas QuickSort is only typically in the range of O(n * log2(n)) and how it really performs depends a lot on the data to be sorted and implementation details. So even if QuickSort is initially faster, its speed decreases quicker with number of elements raising than is the case for HeapSort or MergeSort and that means at some point both will overtake QuickSort. However, in practice usually your data sets to be sorted will never get big enough to even reach that break even point or cannot even get big enough as your system would run out of resources prior to that.

There are only a few unfortunate cases where QuickSort may fail horribly depending on your implementation, e.g. when the initial dataset is already sorted or reverse sorted or follows another weird sorting pattern (e.g. every pair of elements is reverse sorted and all pairs are sorted). Yet those can be avoided by making the QuickSort implementation more robust. Typical optimizations include:

Instead of choosing the Pivot element to be always the element in the center, choose it randomly
Instead of just picking a Pivot element, pick three (from fixed positions or randomly), then take the one in the middle
Instead of partitioning into two sub-arrays, partition into three (smaller than Pivot, bigger than Pivot, and equal to Pivot; usually equal elements are either kept in the smaller or bigger sub-array).
Once the size of the sub-array to be sorted falls below a certain threshold (numbers around 4-16 are popular), sort this sub-array by an alternative algorithm (InsertionSort is quite popular for this task but pretty much any fallback will do; this is to avoid requiring too many recursions or too much space in an auxiliary array)
Instead of switching to a different algorithm below a certain threshold, you can also just stop sorting completely below that threshold, then run an alternative algorithm over the entire data that that will now benefit from a data set is almost sorted (this only works if you choose the right second algorithms)
Shuffle the data set prior to applying QuickSort (this seems stupid, but shuffling totally unsorted data won't make QuickSort any slower, yet it avoids issues if the data happens to be pre-sorted)

All of these changes will make QuickSort slower, sometimes a tiny bit, sometimes a bit more, but they also prevent situations where QuickSort would become unbearable slow otherwise and have very little impact if your data is more or less random but a huge positive impact if it isn't.

回复收藏 0 原文

深府石板幽径 2024-09-01 18:05:56

为了回答最初的问题并解决这里的一些其他评论：

我只是比较了选择、快速、合并和堆排序的实现，看看它们如何相互叠加。答案是它们都有各自的缺点。

长话短说：
快速是最好的通用排序（相当快、稳定且大部分就位）
就我个人而言，我更喜欢堆排序，除非我需要稳定的排序。

选择 - N^2 - 它实际上只适用于少于 20 个元素左右，然后它的表现就更好了。除非你的数据已经排序，或者非常非常接近排序。 N^2 变得非常慢非常快。

根据我的经验，“快”实际上并不总是那么快。使用快速排序作为一般排序的好处是它相当快并且稳定。它也是一种就地算法，但由于它通常是递归实现的，因此会占用额外的堆栈空间。它也介于 O(n log n) 和 O(n^2) 之间。某些类型的时间似乎证实了这一点，特别是当值落在一个狭窄的范围内时。它比 10,000,000 个项目的选择排序快得多，但比合并或堆慢。

合并排序保证 O(n log n)，因为它的排序不依赖于数据。它只是做它该做的事，不管你给它什么值。它也很稳定，但如果您不小心实现，非常大的排序可能会耗尽您的堆栈。有一些复杂的就地合并排序实现，但通常您需要在每个级别中使用另一个数组来将您的值合并到其中。如果这些数组位于堆栈上，您可能会遇到问题。

堆排序的最大复杂度是 O(n log n)，但在许多情况下更快，具体取决于您必须将值在 log n 深堆中向上移动多远。堆可以很容易地在原始数组中就地实现，因此它不需要额外的内存，而且它是迭代的，所以不用担心递归时堆栈溢出。堆排序的巨大缺点是它不是一种稳定的排序，这意味着如果您需要它，它就是不合适的。

回复收藏 0 原文

乞讨 2024-09-01 18:05:55

堆排序是 O(N log N) 保证的，这比快速排序中最坏的情况要好得多。堆排序不需要更多内存来让另一个数组像合并排序那样放置有序数据。那么为什么商业应用程序坚持使用快速排序呢？与其他实现相比，快速排序有何特别之处？

我自己测试了这些算法，发现快速排序确实有一些特别之处。它运行速度很快，比Heap和Merge算法快得多。

快速排序的秘密是：它几乎不进行不必要的元素交换。交换很费时间。

使用堆排序，即使所有数据都已排序，您也将交换 100% 的元素来对数组进行排序。

如果使用合并排序，情况会更糟。您将把 100% 的元素写入另一个数组中，然后将其写回到原始数组中，即使数据已经排序。

使用快速排序，您不会交换已订购的内容。如果你的数据是完全有序的，那么你几乎不需要交换任何东西！尽管对于最坏的情况有很多争论，但在主元的选择上稍加改进，除了获取数组的第一个或最后一个元素之外，都可以避免这种情况。如果从第一个、最后一个和中间元素之间的中间元素获得主元，就足以避免最坏的情况。

Quicksort的优越之处不是最坏的情况，而是最好的情况！在最好的情况下，你会进行相同数量的比较，好吧，但你几乎什么也不交换。一般情况下，您会交换部分元素，但不是全部元素，如堆排序和合并排序。这就是快速排序的最佳时机。更少的交换，更快的速度。

下面在我的计算机上用 C# 实现，在发布模式下运行，使用中间枢轴比 Array.Sort 快 3 秒，使用改进枢轴比 Array.Sort 快 2 秒（是的，获得良好枢轴需要一定的开销）。

static void Main(string[] args)
{
    int[] arrToSort = new int[100000000];
    var r = new Random();
    for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);

    Console.WriteLine("Press q to quick sort, s to Array.Sort");
    while (true)
    {
        var k = Console.ReadKey(true);
        if (k.KeyChar == 'q')
        {
            // quick sort
            Console.WriteLine("Beg quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            QuickSort(arrToSort, 0, arrToSort.Length - 1);
            Console.WriteLine("End quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
        else if (k.KeyChar == 's')
        {
            Console.WriteLine("Beg Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            Array.Sort(arrToSort);
            Console.WriteLine("End Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
    }
}

static public void QuickSort(int[] arr, int left, int right)
{
    int begin = left
        , end = right
        , pivot
        // get middle element pivot
        //= arr[(left + right) / 2]
        ;

    //improved pivot
    int middle = (left + right) / 2;
    int
        LM = arr[left].CompareTo(arr[middle])
        , MR = arr[middle].CompareTo(arr[right])
        , LR = arr[left].CompareTo(arr[right])
        ;
    if (-1 * LM == LR)
        pivot = arr[left];
    else
        if (MR == -1 * LR)
            pivot = arr[right];
        else
            pivot = arr[middle];
    do
    {
        while (arr[left] < pivot) left++;
        while (arr[right] > pivot) right--;

        if(left <= right)
        {
            int temp = arr[right];
            arr[right] = arr[left];
            arr[left] = temp;

            left++;
            right--;
        }
    } while (left <= right);

    if (left < end) QuickSort(arr, left, end);
    if (begin < right) QuickSort(arr, begin, right);
}

Heapsort is O(N log N) guaranted, what is much better than worst case in Quicksort. Heapsort doesn't need more memory for another array to putting ordered data as is needed by Mergesort. So why do comercial applications stick with Quicksort? What Quicksort has that is so special over others implementations?

I've tested the algorithms myself and I've seen that Quicksort has something special indeed. It runs fast, much faster than Heap and Merge algorithms.

The secret of Quicksort is: It almost doesn't do unnecessary element swaps. Swap is time consuming.

With Heapsort, even if all of your data is already ordered, you are going to swap 100% of elements to order the array.

With Mergesort, it's even worse. You are going to write 100% of elements in another array and write it back in the original one, even if data is already ordered.

With Quicksort you don't swap what is already ordered. If your data is completely ordered, you swap almost nothing! Although there is a lot of fussing about worst case, a little improvement on the choice of pivot, any other than getting the first or last element of array, can avoid it. If you get a pivot from the intermediate element between first, last and middle element, it is suficient to avoid worst case.

What is superior in Quicksort is not the worst case, but the best case! In best case you do the same number of comparisons, ok, but you swap almost nothing. In average case you swap part of the elements, but not all elements, as in Heapsort and Mergesort. That is what gives Quicksort the best time. Less swap, more speed.

The implementation below in C# on my computer, running on release mode, beats Array.Sort by 3 seconds with middle pivot and by 2 seconds with improved pivot (yes, there is an overhead to get a good pivot).

static void Main(string[] args)
{
    int[] arrToSort = new int[100000000];
    var r = new Random();
    for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);

    Console.WriteLine("Press q to quick sort, s to Array.Sort");
    while (true)
    {
        var k = Console.ReadKey(true);
        if (k.KeyChar == 'q')
        {
            // quick sort
            Console.WriteLine("Beg quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            QuickSort(arrToSort, 0, arrToSort.Length - 1);
            Console.WriteLine("End quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
        else if (k.KeyChar == 's')
        {
            Console.WriteLine("Beg Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            Array.Sort(arrToSort);
            Console.WriteLine("End Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
    }
}

static public void QuickSort(int[] arr, int left, int right)
{
    int begin = left
        , end = right
        , pivot
        // get middle element pivot
        //= arr[(left + right) / 2]
        ;

    //improved pivot
    int middle = (left + right) / 2;
    int
        LM = arr[left].CompareTo(arr[middle])
        , MR = arr[middle].CompareTo(arr[right])
        , LR = arr[left].CompareTo(arr[right])
        ;
    if (-1 * LM == LR)
        pivot = arr[left];
    else
        if (MR == -1 * LR)
            pivot = arr[right];
        else
            pivot = arr[middle];
    do
    {
        while (arr[left] < pivot) left++;
        while (arr[right] > pivot) right--;

        if(left <= right)
        {
            int temp = arr[right];
            arr[right] = arr[left];
            arr[left] = temp;

            left++;
            right--;
        }
    } while (left <= right);

    if (left < end) QuickSort(arr, left, end);
    if (begin < right) QuickSort(arr, begin, right);
}

回复收藏 0 原文

﹂绝世的画 2024-09-01 18:05:55

本文有一些分析。

另外，来自维基百科：

最直接的竞争对手
快速排序就是堆排序。堆排序是
通常比
快速排序，但最坏情况运行
时间始终为 θ(nlogn)。快速排序是
通常更快，尽管仍然存在
最坏情况表现的机会
除了 introsort 变体，
当情况不好时切换到堆排序
被检测到。如果事先知道的话
堆排序将会是
有需要的话，直接使用即可
比等待 introsort 更快
切换到它。

回复收藏 0 原文

作死小能手 2024-09-01 18:05:55

在大多数情况下，快与慢一点是无关紧要的……你只是不想让它偶尔变得太慢。尽管您可以调整快速排序以避免速度缓慢的情况，但您会失去基本快速排序的优雅性。因此，对于大多数事情，我实际上更喜欢 HeapSort...您可以以完全简单优雅的方式实现它，并且永远不会出现缓慢的排序。

对于大多数情况下您确实需要最大速度的情况，QuickSort 可能比 HeapSort 更好，但两者都可能不是正确的答案。对于速度关键的情况，值得仔细检查情况的细节。例如，在我的一些速度关键代码中，数据已经排序或接近排序是很常见的（它正在索引多个相关字段，这些字段通常一起上下移动或彼此相反地上下移动，所以一旦你按一个排序，其他的要么排序，要么反向排序，要么接近......其中任何一个都可以杀死快速排序）。对于这种情况，我没有实现......相反，我实现了 Dijkstra 的 SmoothSort ...一个 HeapSort 变体，当已经排序或接近排序时，它是 O(N) ...它不是那么优雅，不太容易理解，但速度很快...阅读http://www.cs.utexas.edu /user/EWD/ewd07xx/EWD796a.PDF 如果您想要一些更具挑战性的编码。

回复收藏 0 原文

我一直都在从未离去 2024-09-01 18:05:55

快速排序-堆排序就地混合也非常有趣，因为它们中的大多数在最坏的情况下只需要 n*log n 比较（它们相对于渐近的第一项是最优的，因此它们避免了最坏的情况的快速排序），O(log n) 额外空间，并且它们相对于已排序的数据集保留了快速排序的至少“一半”的良好行为。 Dikert 和 Weiss 在 http://arxiv.org/pdf/1209.4214v1 中提出了一个非常有趣的算法。 pdf：

选择一个主元 p 作为 sqrt(n) 元素的随机样本的中位数（这可以通过 Tarjan&co 的算法进行最多 24 sqrt(n) 比较，或 5 sqrt(n) ）通过更复杂的 Schonhage 蜘蛛工厂算法进行比较）；
正如快速排序的第一步一样，将数组分为两部分；
堆化最小部分并使用 O(log n) 额外位来编码堆，其中每个左子节点的值都大于其同级；
递归提取堆的根，向下筛选根留下的空隙，直到到达堆的叶子，然后用从数组其他部分取出的适当元素填充该空隙；
对数组剩余的无序部分进行递归（如果选择 p 作为精确中位数，则根本不存在递归）。

回复收藏 0 原文

无需解释 2024-09-01 18:05:55

比较。在快速排序和合并排序之间，由于两者都是就地排序类型，因此快速排序的最坏情况运行时间之间存在差异O(n^2) 对于堆排序，它仍然是 O(n*log(n)) 并且对于平均数据量，快速排序会更有用。由于它是随机算法，因此得到正确答案的概率。在更短的时间内将取决于您选择的枢轴元素的位置。

因此，

良好判断： L 和 G 的大小均小于 3s/4

L 和 G 之一的大小大于 3s/4

错误判断：对于小金额，我们可以进行插入排序，对于大量数据，可以进行堆排序。

回复收藏 0 原文

秋叶绚丽 2024-09-01 18:05:55

堆排序的优点是最坏的运行情况为O(n*log(n))，因此在快速排序可能表现不佳的情况下（通常主要是排序的数据集），堆排序是首选。

回复收藏 0 原文

忆伤 2024-09-01 18:05:55

对我来说，堆排序和快速排序之间有一个非常根本的区别：后者使用递归。在递归算法中，堆随着递归次数的增加而增长。如果 n 很小，这并不重要，但现在我正在对 n=10^9 的两个矩阵进行排序！该程序需要近 10 GB 的 RAM，任何额外的内存都会使我的计算机开始交换到虚拟磁盘内存。我的磁盘是 RAM 磁盘，但仍然交换到它会速度产生巨大差异。因此，在用 C++ 编码的 statpack 中，其中包括可调整维度矩阵（程序员事先未知其大小）以及非参数统计排序，我更喜欢使用堆排序来避免延迟使用非常大的数据矩阵。

回复收藏 0 原文

溇涏 2024-09-01 18:05:55

好吧，如果你进入架构级别......我们在缓存内存中使用队列数据结构。因此队列中可用的内容将被排序。就像在快速排序中一样，我们可以将数组划分为任何长度......但在堆中排序（通过使用数组）可能会发生这样的情况：父数组可能不存在于缓存中可用的子数组中，然后必须将其放入缓存内存中......这非常耗时。
这是最好的快速排序！！

回复收藏 0 原文

秋凉 2024-09-01 18:05:55

Heapsort 构建一个堆，然后重复提取最大项。最坏的情况是 O(n log n)。

但是，如果您看到快速排序的最坏情况，即 O(n2)，您会意识到对于大数据来说快速排序并不是一个好的选择。

所以这使得排序成为一件有趣的事情；我相信今天有如此多的排序算法存在的原因是因为它们都在最好的地方“最好”。例如，如果数据已排序，则冒泡排序可以执行快速排序。或者，如果我们对要排序的项目有所了解，那么我们可能可以做得更好。

这可能无法直接回答您的问题，我想添加我的两分钱。

回复收藏 0 原文

快速排序与堆排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（13）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

快速排序与堆排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（13）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。