快速排序比合并排序慢?

发布于 2024-07-12 11:45:50 字数 2194 浏览 6 评论 0原文

昨天我正在致力于实现快速排序,然后我运行了它,期望比合并排序(我也实现了)更快的运行时间。 我运行了这两个算法,虽然快速排序对于小于 100 个元素的较小数据集更快(并且我确实验证了它是否有效),但合并排序很快就成为更快的算法。 我被告知快速排序几乎总是比合并排序“更快”,并且我知道关于这个主题存在一些争论,但我至少期望它比这更接近。 对于大于 10000 个元素的数据集,合并排序速度提高了 4 倍以上。 这是预期的结果,还是我的快速排序代码中有错误?

合并排序:

public static void mergeSort(int[ ] e)
{
    if (e.length <= 1) return;
    int[] first = new int[e.length/2];
    int[] second = new int[e.length - first.length];
    System.arraycopy(e, 0, first, 0, first.length);
    System.arraycopy(e, first.length, second, 0, second.length);
    mergeSort(first);
    mergeSort(second);
    System.arraycopy(merge(first, second), 0, e, 0, e.length);
}

private static int[] merge(int[] first, int[] second) {
    int iFirst = 0;
    int iSecond = 0;
    int iCombined = 0;

    int[] combined = new int[first.length + second.length];
    while(iFirst < first.length && iSecond < second.length) {
        if (first[iFirst] > second[iSecond]) {
            combined[iCombined++] = second[iSecond++];
        }
        else combined[iCombined++] = first[iFirst++];
    }
    for(; iFirst < first.length; iFirst++) {
        combined[iCombined++] = first[iFirst];
    }
    for(; iSecond < second.length; iSecond++) {
        combined[iCombined++] = second[iSecond];
    }
    return combined;
}

快速排序:

public static void quicksort(int[] a, int first, int last) {
    if (first >= last) return;

    int partitionIndex = partition(a, first, last);
    quicksort(a, first, partitionIndex - 1);
    quicksort(a, partitionIndex + 1, last);
}

public static int partition(int[] x, int first, int last) {
    int left = first;
    int right = last;
    int pivot = x[first];
    int pivotIdx = first;

    while(left <= right) {
        while(left < x.length && x[left] <= pivot) left++;
        while(right >= 0 && x[right] > pivot) right--;
        if (left <= right) {
            int temp = x[left];
            x[left] = x[right];
            x[right] = temp;
        }
    }
    pivotIdx = right;
    x[first] = x[right];
    x[pivotIdx] = pivot;
    return pivotIdx;
}

I was working on implementing a quicksort yesterday, and then I ran it, expecting a faster runtime than the Mergesort (which I had also implemented). I ran the two, and while the quicksort was faster for smaller data sets <100 elements (and I did verify that it works), the mergesort became the quicker algorithm fairly quickly. I had been taught that quicksort is almost always "quicker" than mergesort, and I understand that there is some debate on this topic, but I at least expected it to be closer than this. For data sets >10000 elements, the mergesort was over 4 times faster. Is this to be expected, or is there an error in my quicksort code?

mergesort:

public static void mergeSort(int[ ] e)
{
    if (e.length <= 1) return;
    int[] first = new int[e.length/2];
    int[] second = new int[e.length - first.length];
    System.arraycopy(e, 0, first, 0, first.length);
    System.arraycopy(e, first.length, second, 0, second.length);
    mergeSort(first);
    mergeSort(second);
    System.arraycopy(merge(first, second), 0, e, 0, e.length);
}

private static int[] merge(int[] first, int[] second) {
    int iFirst = 0;
    int iSecond = 0;
    int iCombined = 0;

    int[] combined = new int[first.length + second.length];
    while(iFirst < first.length && iSecond < second.length) {
        if (first[iFirst] > second[iSecond]) {
            combined[iCombined++] = second[iSecond++];
        }
        else combined[iCombined++] = first[iFirst++];
    }
    for(; iFirst < first.length; iFirst++) {
        combined[iCombined++] = first[iFirst];
    }
    for(; iSecond < second.length; iSecond++) {
        combined[iCombined++] = second[iSecond];
    }
    return combined;
}

quicksort:

public static void quicksort(int[] a, int first, int last) {
    if (first >= last) return;

    int partitionIndex = partition(a, first, last);
    quicksort(a, first, partitionIndex - 1);
    quicksort(a, partitionIndex + 1, last);
}

public static int partition(int[] x, int first, int last) {
    int left = first;
    int right = last;
    int pivot = x[first];
    int pivotIdx = first;

    while(left <= right) {
        while(left < x.length && x[left] <= pivot) left++;
        while(right >= 0 && x[right] > pivot) right--;
        if (left <= right) {
            int temp = x[left];
            x[left] = x[right];
            x[right] = temp;
        }
    }
    pivotIdx = right;
    x[first] = x[right];
    x[pivotIdx] = pivot;
    return pivotIdx;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

画骨成沙 2024-07-19 11:45:50

实际上,我刚刚用 C 语言编写了一个“链表比较排序演示程序”,并得出了类似的结论(合并排序在大多数用途中都会击败快速排序),尽管我被告知快速排序通常不用于链表。 我会注意到枢轴值的选择是一个巨大的因素——我的初始版本使用随机节点作为枢轴,当我对其进行一些改进以取两个(随机)节点的平均值时,1000000 条记录的执行时间从超过 4 分钟缩短到不到 10 秒,与合并排序相当。

合并排序和快速排序具有相同的大 O 最佳情况 (n*log(n)),尽管人们可能会声称,大 O 实际上是关于迭代计数而不是比较计数。 两者之间可能产生的最大差异始终会对快速排序造成损害,并且它涉及已经很大程度上排序或包含大量关系的列表(当快速排序比合并排序更好时,差异不会那么大)。 这是因为关系或已排序的段直接通过合并排序进行简化; 当两个拆分列表返回合并时,如果一个列表已经包含所有较小的值,则左侧的所有值将一次与右侧的第一个元素进行比较,然后(因为返回的列表有一个内部顺序)无需进行进一步的比较,只需将右侧的内容迭代到最后即可。 也就是说,迭代次数将保持不变,但比较次数减少一半。 如果您正在谈论实际时间并对字符串进行排序,那么比较的成本就很高。

如果不仔细确定主元值,快速排序中的平局和已排序段很容易导致不平衡列表,而不平衡列表(例如,右侧 1 个,左侧 10 个)是导致速度减慢的原因。 因此,如果您可以让快速排序在已排序列表上的执行效果与在随机列表上的执行效果一样好,那么您就获得了查找枢轴的好方法。

如果您感兴趣,演示程序会生成如下输出:

[root~/C] ./a.out -1 3 
Using "", 0 records
Primary Criteria offset=128

Command (h for help, Q to quit): N
How many records? 4000000
New list is 562500.00 kb

Command (h for help, Q to quit): m

Mergesorting..............3999999 function calls
123539969 Iterations     Comparison calls: 82696100
Elapsed time: 0 min 9 sec


Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............4000000 function calls
190179315 Iterations     Comparison calls: 100817020
Elapsed time: 0 min 23 sec

Altho without the krazy kolors. 我在中间还有一些关于它的东西 此页面

附: 这两种排序都不需要链表的额外内存。

I actually just wrote a "linked-list comparative sort demo program" in C and arrived at a similar conclusion (that mergesort will beat quicksort for most uses), altho I have been told that quicksort is generally not used for linked lists anyway. I would note that the choice of pivot values is a monster factor -- my initial version used a random node as the pivot, and when I refined it a bit to take a mean of two (random) nodes, the exectution time for 1000000 records went from over 4 minutes to less than 10 seconds, putting it right on par with mergesort.

Mergesort and quicksort have the same big O best case (n*log(n)) and despite what people may try to claim, big O is really about iteration count and not comparison count. The biggest difference that can be produced between the two of them will always be to quicksort's detriment, and it involves lists that are already largely sorted or contain a large number of ties (when quicksort does better than mergesort, the difference will not be nearly so great). This is because ties or already sorted segments streamline straight through mergesort; when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. This is to say, the number of iterations will remain constant, but the number of comparisons is cut in half. If you are talking about actual time and are sorting strings, it's the comparisons that are expensive.

Ties and already sorted segments in quicksort can easily lead to unbalanced lists if the pivot value is not carefully determined, and the imbalanced lists (eg, one on the right, ten on the left) are what causes the slowdown. So, if you can get your quicksort to perform as well on an already sorted list as it does on a ramdomized list, you've got a good method for finding the pivot.

If you're interested, the demo program produces output like this:

[root~/C] ./a.out -1 3 
Using "", 0 records
Primary Criteria offset=128

Command (h for help, Q to quit): N
How many records? 4000000
New list is 562500.00 kb

Command (h for help, Q to quit): m

Mergesorting..............3999999 function calls
123539969 Iterations     Comparison calls: 82696100
Elapsed time: 0 min 9 sec


Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............4000000 function calls
190179315 Iterations     Comparison calls: 100817020
Elapsed time: 0 min 23 sec

Altho without the krazy kolors. There's some more stuff about it by me about halfway down this page.

ps. neither sort requires extra memory with the linked list.

不甘平庸 2024-07-19 11:45:50

对于基于随机数组的数据,合并排序要慢得多,只要它适合内存。 这是我第一次看到这样的争论。

  • 首先对最短子数组进行 qsort。
  • 切换到低于 5-25 个元素的插入排序
  • 执行正常的主元选择

您的 qsort 非常慢,因为它尝试对长度为 2 和 3 的数组进行分区和 qsort。

Mergesort is a lot slower for random array based data, as long as it fits in ram. This is the first time I see it debated.

  • qsort the shortest subarray first.
  • switch to insertion sort below 5-25 elements
  • do a normal pivot selection

Your qsort is very slow because it tries to partition and qsort arrays of length 2 and 3.

淡莣 2024-07-19 11:45:50

对于相对较小的数组大小,快速排序的优点之一只是硬件实现的产物。

在数组上,快速排序可以就地完成,这意味着您正在读取和写入同一内​​存区域。 另一方面,合并排序通常需要分配新的缓冲区,这意味着您的内存访问更加分散。 您可以在示例实现中看到这两种行为。

因此,对于相对较小的数据集,快速排序更有可能获得缓存命中,因此在大多数硬件上运行得更快。

正如您的实验所证实的那样,对于大型数据集或其他数据结构(例如链表),合并排序仍然是一个非常好的解决方案。

One of the advantages of quicksort for relatively small array sizes is just an artifact of hardware implementation.

On arrays, quicksort can be done in-place, meaning that you're reading from and writing to the same area of memory. Mergesort, on the other hand, typically requires allocating new buffers, meaning your memory access is more spread out. You can see both of these behaviors in your example implementations.

As a result, for relatively small datasets, quicksort is more likely to get cache hits and therefore just tends to run faster on most hardware.

Mergesort is still a pretty good solution for large data sets or other data structures, like linked lists, as your experiments confirm.

他不在意 2024-07-19 11:45:50

根据这篇维基百科文章,您的结果是预期的。

Based on this wikipedia article your results are expected.

自由范儿 2024-07-19 11:45:50

合并排序的最坏情况是快速排序的平均情况,因此如果没有良好的实现,合并排序总体上会更快。 让快速排序快速工作是为了避免低于平均水平的情况。 选择一个更好的主元(3 的中位数有帮助),您会看到差异。

Merge sort's worst case is quicksort's average case, so if you don't have a good implementation, merge sort is going to be faster overall. Getting quicksort to work fast is about avoiding sub-average cases. Choose a better pivot (median-of-3 helps) and you'll see a difference.

孤者何惧 2024-07-19 11:45:50

我可以想象,通过直接访问内存(例如使用 C),可以比合并排序更好地提高快速排序的性能。

另一个原因是合并排序需要更多内存,因为很难将其实现为就地排序。

特别是对于您的实现,您可以改进主元的选择,有很多不同的算法可以找到好的主元。

正如维基百科所示,可以通过不同的方式实现快速排序。

I could imagine that by directly accessing the memory, using C for example, one can improve the performance of Quicksort more than it is possible with Mergesort.

Another reason is that Mergesort needs more memory because it's hard to implement it as an in-place sort.

And specifically for your implementation you could improve the choosing of the pivot, there are a lot of different algorithms to find a good pivot.

As can be seen on wikipedia, one can implement Quicksort in different ways.

草莓味的萝莉 2024-07-19 11:45:50

(1) 有一个 qsort 算法,由 C qsort() 使用,不需要额外的内存。 这
最有可能是霍尔发明的。 使得 qsort() 在 C 中速度更快。

(2) 在运行 qsort 之前随机化数据几乎总是会加快速度。

(3) 选择枢轴的中位数数据可能会使其更快,

(1) There is an qsort algo, used by C qsort(), which doesn't require extra memory. This
was most probably invented by Hoare. This makes qsort() fast in C.

(2) Randomizing the data before running qsort will almost always speed it up.

(3) selecting the median data for pivot may make it faster,

百变从容 2024-07-19 11:45:50

这与算法的分析是一致的。
对于任何输入和每个运行时,合并排序都保证 O(nlogn)。
快速排序的最佳情况 O(nlogn) 和平均情况 O(nlogn),但最坏情况 O(n^2),因此平均执行将在 O(nlogn) 和 O(n^2) 之间。

快速排序是最好的一般情况算法,因为它的开销较低,因此对于高达 10000 左右的 n 值具有良好的速度,并且对于任意天文数字的 n 值仍然具有良好的运行时间。 合并排序有一个不幸的开销,即每次递归调用都需要编写堆栈帧。 因此,对于较低的 n 值,它在 RT = cnlogn 中具有极高的 c,并且它不是首选的通用排序方法。

编辑:Software Monkey 指出了一个矛盾:快速排序对于随机输入的平均时间为 O(nlogn),但最坏情况为 O(n^2)。 所以它实际上在某种程度上受到数据熵的限制——或者你可以随机选择主元。 不过我可能还是有点偏离。

This is consistent with the analysis of the algorithms.
Merge-sort is guaranteed O(nlogn) for any input and for every runtime.
Quicksort is best-case O(nlogn) and average case O(nlogn), but worst-case O(n^2), so the average execution will be in between O(nlogn) and O(n^2).

Quicksort is the best general case algorithm because it has low overhead, so it has good speed for values of n up to about 10000 or so and still good runtime for arbitrarily astronomical values of n. Merge-sort has the unfortunate overhead of writing a stack frame, required by every recursive call. Thus, for low values of n it has an atrociously high c in RT = cnlogn and it is not the preferred general sorting method.

Edit: Software Monkey pointed out a contradiction: Quicksort averages O(nlogn) for random input, but O(n^2) worst case. So it actually is somewhat bound by the entropy of your data -- or you could choose the pivot randomly. I might still be off a bit though.

月亮是我掰弯的 2024-07-19 11:45:50

如果您将堆排序实现为快速排序最坏情况下的基本排序算法,您将实现 theta(n log n) 算法。

如果您不需要稳定的排序,并且不对链表进行排序,我认为这将是您可以达到的最快速度。

合并排序

If you implement heap sort as the base sorting algorithm in quick sorts worst case scenario, you achieve a theta(n log n) algorithm.

If you don't need stable sorting, and don't sort a linked list, I think that would be the fastest you could go.

Merge sort

薄凉少年不暖心 2024-07-19 11:45:50

我认为只要数据适合内存,好的合并排序实现比好的快速排序实现表现得更好。

qsort() 最广泛使用的实现之一是 glibc qsort(),当数据适合内存时,它在大多数情况下在内部使用合并排序。 这种合并排序会分配一个用于合并的临时内存空间,这会增加一些内存开销,但大多数时候,它通过良好的枢轴选择和优化优于其自己的内部快速排序实现。 glibc仅在数据和合并排序的临时内存无法容纳在内存中时才使用快速排序。

我在配备 2.1GHz CPU 和几 GB RAM 的机器上测量了这两种实现的性能。 输入由伪随机生成器生成,每个密钥都是32位无符号整数,这意味着由于比较函数的接口,比较周期比整数比较多一点。

对于合并排序:

2 MB, time_diff 165.156000 ms, 78.752518 ns per byte
4 MB, time_diff 344.298000 ms, 82.087040 ns per byte
8 MB, time_diff 730.926000 ms, 87.133169 ns per byte
16 MB, time_diff 1541.215000 ms, 91.863573 ns per byte
32 MB, time_diff 3088.924000 ms, 92.057109 ns per byte
64 MB, time_diff 6262.868000 ms, 93.324006 ns per byte
128 MB, time_diff 12887.018000 ms, 96.015766 ns per byte
256 MB, time_diff 26731.597000 ms, 99.582959 ns per byte

对于快速排序:

2 MB, time_diff 243.519000 ms, 116.118908 ns per byte
4 MB, time_diff 504.975000 ms, 120.395422 ns per byte
8 MB, time_diff 1075.276000 ms, 128.182888 ns per byte
16 MB, time_diff 2183.865000 ms, 130.168498 ns per byte
32 MB, time_diff 4343.993000 ms, 129.461080 ns per byte
64 MB, time_diff 8714.166000 ms, 129.851192 ns per byte
128 MB, time_diff 17881.344000 ms, 133.226395 ns per byte
256 MB, time_diff 36751.029000 ms, 136.908252 ns per byte

您可以看到这两种实现之间的性能存在明显差异,以及为什么在如此广泛使用的 qsort 实现中,合并排序优于快速排序。 这种差异背后的主要原因似乎是因为快速排序比合并排序多了 10-20% 的比较,这是由于每一步的分割不均匀造成的。

I think as long as data fits in memory, good merge sort implementation performs better than good quick sort implementation.

One of the most widely used implementations of qsort(), glibc qsort(), internally uses merge sort for most of the cases when data fits in memory. This merge sort allocates a temporary memory space used for merging, which adds some memory overhead, but most of the time, it outperforms its own internal quicksort implementation with good pivot selection and optimization. glibc only uses quicksort when the data and the temporary memory for merge sort cannot fit in memory.

I have measured performance of those two implementation on my machine with 2.1GHz CPU with several GB of RAM. The inputs are generated with pseudo-random generator, and each key is 32bit unsigned integer, which means a bit of more comparison cycles than integer comparison due to interface of comparison function.

For merge sort:

2 MB, time_diff 165.156000 ms, 78.752518 ns per byte
4 MB, time_diff 344.298000 ms, 82.087040 ns per byte
8 MB, time_diff 730.926000 ms, 87.133169 ns per byte
16 MB, time_diff 1541.215000 ms, 91.863573 ns per byte
32 MB, time_diff 3088.924000 ms, 92.057109 ns per byte
64 MB, time_diff 6262.868000 ms, 93.324006 ns per byte
128 MB, time_diff 12887.018000 ms, 96.015766 ns per byte
256 MB, time_diff 26731.597000 ms, 99.582959 ns per byte

For quick sort:

2 MB, time_diff 243.519000 ms, 116.118908 ns per byte
4 MB, time_diff 504.975000 ms, 120.395422 ns per byte
8 MB, time_diff 1075.276000 ms, 128.182888 ns per byte
16 MB, time_diff 2183.865000 ms, 130.168498 ns per byte
32 MB, time_diff 4343.993000 ms, 129.461080 ns per byte
64 MB, time_diff 8714.166000 ms, 129.851192 ns per byte
128 MB, time_diff 17881.344000 ms, 133.226395 ns per byte
256 MB, time_diff 36751.029000 ms, 136.908252 ns per byte

You can see that there are clear differences in performance between those two implementation and why mergesort is preferred over quicksort in such widely used qsort implementation. The main reason behind this difference seems to be because quick sort has 10-20% more comparisons than merge sort, due to uneven splitting at each step.

童话里做英雄 2024-07-19 11:45:50

我运行了类似的测试,结果发现纯快速排序(随机选择枢轴)比大型数组的合并排序慢得多。

选择主元作为第一个、中间和最后一个元素的中位数提高了快速排序的性能,但快速排序仍然明显比大型数组(> 100000 个元素)上的合并排序差。

当我实现介绍排序时,我看到了一个很大的改进,即如果递归深度超过某个阈值,快速排序就会回退到堆排序。 我的介绍排序实现
几乎和我的合并排序实现一样快。 当然,intro-sort不再是纯快速排序,因为当纯快速排序碰到一些坏数据时,它使用堆排序将复杂性恢复到n log(n)。 如果您有兴趣,我可以发布结果。

I ran similar tests and pure quick sort (with random choice of pivot) turned out to be much slower than merge sort for large arrays.

Choosing the pivot as the median of the first, middle and last element improved quick sort`s performance, but quick sort was still definitely worse than merge sort on large arrays (> 100000 elements).

I saw a big improvement when I implemented intro-sort, i.e. quick sort that falls back to heap sort if the recursion depth exceeds a certain threshold. My intro-sort implementation
was almost as fast as my merge sort implementation. Of course, intro-sort is no longer pure quick sort as it uses heap sort to bring the complexity back to n log(n) when pure quick sort hits some bad data. I can post the results if you are interested.

淡淡绿茶香 2024-07-19 11:45:50

您的数据集足够随机吗? 它们是否已部分排序?

这可能会影响排序的速度...

就像快速排序的partition()一样,如果数字按顺序排列,您将跳过,直到找到不按顺序排列的数字。

Were you datasets random enough? Were they partially sorted?

That might affect the speed of the sort...

Like for the QuickSort's partition(), you'd skip along if the numbers are in sorted order, until you find one that's not.

菩提树下叶撕阳。 2024-07-19 11:45:50

这可能取决于您要为测试排序的数据类型(已排序的列表、随机的、反向排序的)。 此外,如果您选择随机主元而不是使用第一个元素,那么通常快速排序可能会更快。

It might depend on what sort of data you're sorting for the testing (already ordered lists, randomized, reverse sorted). Also, quicksort will probably be faster in general if you pick a random pivot instead of using the first element.

瞳孔里扚悲伤 2024-07-19 11:45:50

为了获得快速排序的良好性能,重要的是不要一直递归到长度为 1 的列表。

如有必要,您应该考虑将 2、3 甚至 4 的列表排序为嵌套 if 交换。 让我们知道性能如何变化。

For good performance of quicksort, it is important not to recurse all the way down to lists of length 1

You should consider sorting lists of 2, 3 and even 4 as nested ifs swapping if necessary. Let us know how the performance changes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文