QuickSort 与 MergeSort,我做错了什么?

发布于 2024-10-14 07:43:22 字数 2332 浏览 8 评论 0原文

我正在尝试用Java实现几种排序算法,以比较性能。根据我所读到的内容,我期望快速排序比合并排序更快,但在我的代码中却并非如此,所以我认为我的快速排序算法一定有问题:

public class quickSortExample{
public static void main(String[] args){
    Random gen = new Random();
    int n = 1000000;
    int max = 1500000;
    ArrayList<Integer> d = new ArrayList<Integer>();
    for(int i = 0; i < n; i++){
        d.add(gen.nextInt(max));
    }
    ArrayList<Integer> r;
    long start, end;

    start = System.currentTimeMillis();
    r = quickSort(d);
    end = System.currentTimeMillis();
    System.out.println("QuickSort:");
    System.out.println("Time: " + (end-start));
    //System.out.println(display(d));
    //System.out.println(display(r));
}

public static ArrayList<Integer> quickSort(ArrayList<Integer> data){
    if(data.size() > 1){
        int pivotIndex = getPivotIndex(data);
        int pivot = data.get(pivotIndex);
        data.remove(pivotIndex);
        ArrayList<Integer> smallers = new ArrayList<Integer>();
        ArrayList<Integer> largers = new ArrayList<Integer>();
        for(int i = 0; i < data.size(); i++){
            if(data.get(i) <= pivot){
                smallers.add(data.get(i));
            }else{
                largers.add(data.get(i));
            }
        }
        smallers = quickSort(smallers);
        largers = quickSort(largers);
        return concat(smallers, pivot, largers);
    }else{
        return data;
    }
}

public static int getPivotIndex(ArrayList<Integer> d){
    return (int)Math.floor(d.size()/2.0);
}

public static ArrayList<Integer> concat(ArrayList<Integer> s, int p, ArrayList<Integer> l){
    ArrayList<Integer> arr = new ArrayList<Integer>(s);
    arr.add(p);
    arr.addAll(l);

    return arr;
}

public static String display(ArrayList<Integer> data){
    String s = "[";
    for(int i=0; i < data.size(); i++){
        s += data.get(i) + ", ";
    }
    return (s+"]");
}

}

结果(0 到 1500000 之间的 100 万个整数):

mergeSort(也用 arrayList 实现):1.3 秒(平均)(用 int[] 代替 0.7 秒)

QuickSort:3 秒(平均)

这只是我的枢轴选择不好,还是算法中也存在一些缺陷。

另外,是否有更快的方法使用 int[] 而不是 ArrayList() 进行编码? (如何声明较大/较小数组的数组大小?)

PS:我现在可以以就地方式实现它,因此它使用更少的内存,但这不是重点。

编辑 1:我通过更改 concat 方法赢得了 1 秒。 谢谢!

I am trying to implement several sorting algorithms in Java, to compare the performances. From what I've read, I was expecting quickSort to be faster than mergeSort, but on my code it is not, so I assume there must be a problem with my quickSort algorithm:

public class quickSortExample{
public static void main(String[] args){
    Random gen = new Random();
    int n = 1000000;
    int max = 1500000;
    ArrayList<Integer> d = new ArrayList<Integer>();
    for(int i = 0; i < n; i++){
        d.add(gen.nextInt(max));
    }
    ArrayList<Integer> r;
    long start, end;

    start = System.currentTimeMillis();
    r = quickSort(d);
    end = System.currentTimeMillis();
    System.out.println("QuickSort:");
    System.out.println("Time: " + (end-start));
    //System.out.println(display(d));
    //System.out.println(display(r));
}

public static ArrayList<Integer> quickSort(ArrayList<Integer> data){
    if(data.size() > 1){
        int pivotIndex = getPivotIndex(data);
        int pivot = data.get(pivotIndex);
        data.remove(pivotIndex);
        ArrayList<Integer> smallers = new ArrayList<Integer>();
        ArrayList<Integer> largers = new ArrayList<Integer>();
        for(int i = 0; i < data.size(); i++){
            if(data.get(i) <= pivot){
                smallers.add(data.get(i));
            }else{
                largers.add(data.get(i));
            }
        }
        smallers = quickSort(smallers);
        largers = quickSort(largers);
        return concat(smallers, pivot, largers);
    }else{
        return data;
    }
}

public static int getPivotIndex(ArrayList<Integer> d){
    return (int)Math.floor(d.size()/2.0);
}

public static ArrayList<Integer> concat(ArrayList<Integer> s, int p, ArrayList<Integer> l){
    ArrayList<Integer> arr = new ArrayList<Integer>(s);
    arr.add(p);
    arr.addAll(l);

    return arr;
}

public static String display(ArrayList<Integer> data){
    String s = "[";
    for(int i=0; i < data.size(); i++){
        s += data.get(i) + ", ";
    }
    return (s+"]");
}

}

Results (on 1 million integer between 0 and 1500000):

mergeSort (implemented with arrayList too): 1.3sec (average) (0.7sec with int[] instead)

quickSort: 3sec (average)

Is it just the choice of my pivot which is bad, or are there some flaws in the algo too.

Also, is there a faster way to code it with int[] instead of ArrayList()? (How do you declare the size of the array for largers/smallers arrays?)

PS: I now it is possible to implement it in an inplace manner so it uses less memory, but this is not the point of this.

EDIT 1: I earned 1 sec by changing the concat method.
Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

扎心 2024-10-21 07:43:22

PS:我现在可以以就地方式实现它,因此它使用更少的内存,但这不是重点。

这不仅仅是为了使用更少的内存。您在“concat”例程中所做的所有额外工作而不是进行适当的就地快速排序几乎肯定是成本如此之高的原因。如果您无论如何都可以使用额外的空间,那么您应该始终编写合并排序,因为它的比较次数往往比快速排序少。

想想看:在“concat()”中,您不可避免地必须再次遍历子列表,进行更多比较。如果您在单个数组中就地进行了交换,那么一旦您决定交换两个位置,就不会再次做出决定。

PS: I now it is possible to implement it in an inplace manner so it uses less memory, but this is not the point of this.

It's not just to use less memory. All that extra work you do in the "concat" routine instead of doing a proper in-place QuickSort is almost certainly what's costing so much. If you can use extra space anyway, you should always code up a merge sort because it'll tend to do fewer comparisons than a QuickSort will.

Think about it: in "concat()" you inevitably have to make another pass over the sub-lists, doing more comparisons. If you did the interchange in-place, all in a single array, then once you've made the decision to interchange two places, you don't make the decision again.

日久见人心 2024-10-21 07:43:22

我认为你的快速排序的主要问题,就像你说的,是它没有完成到位。

两个主要罪魁祸首是较小的较大的。 ArrayList 的默认大小为 10。在对快速排序的初始调用中,良好的主元意味着较小和较大的数据会增长到 500,000。由于 ArrayList 在达到容量时只会增加一倍,因此必须将其大小调整为 19 倍左右。

由于您要在每个递归级别上创建一个更小和更大的新对象,因此您将执行大约 2*(19+18+...+2+1) 次大小调整。 ArrayList 对象在连接之前必须执行大约 400 次大小调整。串联过程可能会执行类似数量的调整大小。

总而言之,这是很多额外的工作。

糟糕,刚刚注意到 data.remove(pivotIndex)。所选的主元索引(数组的中间)也会导致额外的内存操作(尽管中间通常是比开始、结束或数组更好的选择)。也就是说,arraylist 会将整个内存块复制到后备数组中向左一步的枢轴“右侧”。

关于所选主元的快速注释,由于您要排序的整数均匀分布在 n 和 0 之间(如果Random 名副其实),您可以使用它来选择好的主元。也就是说,第一级快速排序应该选择max*0.5作为其枢轴。较小的第二级应选择 max*0.25,较大的第二级应选择 max*0.75(依此类推)。

I think the major problem with your quicksort, like you say, is that it's not done in place.

The two main culprits are smallers and largers. The default size for an ArrayList is 10. In the initial call to quickSort a good pivot will mean that smallers and largers grow to 500,000. Since the ArrayList only doubles in size when it reaches capacity, it will have to be resized at around 19 times.

Since you are make a new smaller and larger with each level of recursion your going to be performing approximately 2*(19+18+...+2+1) resizes. That's around 400 resizes the ArrayList objects have to perform before they are even concatenated. The concatenation process will probably perform a similar number of resizes.

All in all, this is a lot of extra work.

Oops, just noticed data.remove(pivotIndex). The chosen pivot index (middle of the array) is also going to be causing additional memory operations (even though middle is usual a better choice than beginning or end or the array). That is arraylist will copy the entire block of memory to the 'right' of the pivot one step to the left in the backing array.

A quick note on the chosen pivot, since the integers you are sorting are evenly distributed between n and 0 (if Random lives up to its name), you can use this to choose good pivots. That is, the first level of quick sort should choose max*0.5 as its pivot. The second level with smallers should choose max*0.25 and the second level with largers should choose max*0.75 (and so on).

吃→可爱长大的 2024-10-21 07:43:22

我认为,你的算法效率很低,因为你使用中间数组=更多内存+更多分配/复制时间。这是 C++ 中的代码,但想法是相同的:您必须交换项目,而不是将它们复制到另一个数组

template<class T> void quickSortR(T* a, long N) {

  long i = 0, j = N;        
  T temp, p;

  p = a[ N/2 ];     


  do {
    while ( a[i] < p ) i++;
    while ( a[j] > p ) j--;

    if (i <= j) {
      temp = a[i]; a[i] = a[j]; a[j] = temp;
      i++; j--;
    }
  } while ( i<=j );



  if ( j > 0 ) quickSortR(a, j);
  if ( N > i ) quickSortR(a+i, N-i);
}

I think, that your algo is quite inefficient because you're using intermediate arrays = more memory + more time for allocation/copy. Here is the code in C++ but the idea is the same: you have to swap the items, and not copy them to another arrays

template<class T> void quickSortR(T* a, long N) {

  long i = 0, j = N;        
  T temp, p;

  p = a[ N/2 ];     


  do {
    while ( a[i] < p ) i++;
    while ( a[j] > p ) j--;

    if (i <= j) {
      temp = a[i]; a[i] = a[j]; a[j] = temp;
      i++; j--;
    }
  } while ( i<=j );



  if ( j > 0 ) quickSortR(a, j);
  if ( N > i ) quickSortR(a+i, N-i);
}
韬韬不绝 2024-10-21 07:43:22

Java 中的 OOP 和数据结构基础 作者 Richard Wiener,Lewis J. Pinson 列出了以下快速排序,可能是也可能不是比你的实现更快(我怀疑是这样)......

public static void quickSort (Comparable[] data, int low, int high) {
    int partitionIndex;
    if (high - low > 0) {
        partitionIndex = partition(data, low, high);
        quickSort(data, low, partitionIndex - 1);
        quickSort(data, partitionIndex + 1, high);
    }
}

private static int partition (Comparable[] data, int low, int high) {
    int k, j;
    Comparable temp, p;
    p = data[low]; // Partition element
    // Find partition index(j).
    k = low;
    j = high + 1;

    do {
        k++;
    } while (data[k].compareTo(p) <= 0 && k < high);

    do {
        j--;
    } while (data[j].compareTo(p) > 0);

    while (k < j) {
        temp = data[k];
        data[k] = data[j];
        data[j] = temp;

        do {
            k++;
        } while (data[k].compareTo(p) <= 0);

        do {
            j--;
        } while (data[j].compareTo(p) > 0);
    }
    // Move partition element(p) to partition index(j).
    if (low != j) {
        temp = data[low];
        data[low] = data[j];
        data[j] = temp;
    }
    return j; // Partition index
}

Fundamentals of OOP and data structures in Java By Richard Wiener, Lewis J. Pinson lists quicksort as following, which may or may not be faster (I suspect it is) than your implementation...

public static void quickSort (Comparable[] data, int low, int high) {
    int partitionIndex;
    if (high - low > 0) {
        partitionIndex = partition(data, low, high);
        quickSort(data, low, partitionIndex - 1);
        quickSort(data, partitionIndex + 1, high);
    }
}

private static int partition (Comparable[] data, int low, int high) {
    int k, j;
    Comparable temp, p;
    p = data[low]; // Partition element
    // Find partition index(j).
    k = low;
    j = high + 1;

    do {
        k++;
    } while (data[k].compareTo(p) <= 0 && k < high);

    do {
        j--;
    } while (data[j].compareTo(p) > 0);

    while (k < j) {
        temp = data[k];
        data[k] = data[j];
        data[j] = temp;

        do {
            k++;
        } while (data[k].compareTo(p) <= 0);

        do {
            j--;
        } while (data[j].compareTo(p) > 0);
    }
    // Move partition element(p) to partition index(j).
    if (low != j) {
        temp = data[low];
        data[low] = data[j];
        data[j] = temp;
    }
    return j; // Partition index
}
っ左 2024-10-21 07:43:22

我同意原因是不必要的复制。接下来还有一些注释。

枢轴索引的选择很糟糕,但这不是问题,因为你的数字是随机的。

(int)Math.floor(d.size()/2.0) 相当于 d.size()/2

data.remove(pivotIndex); 是不必要的 n/2 元素复制。相反,您应该在以下循环中检查 i ==ivotIndex 是否并跳过此元素。 (嗯,你真正需要做的是就地排序,但我只是建议直接的改进。)

将所有等于枢轴的元素放在同一个(“较小”)部分是一个坏主意主意。想象一下当数组的所有元素都相等时会发生什么。 (同样,在这种情况下不是问题。)


for(i = 0; i < s.size(); i++){
    arr.add(s.get(i));
}

相当于 arr.addAll(s) 。当然,这里又进行了不必要的复制。您可以只将右侧部分的所有元素添加到左侧部分,而不是创建新列表。

(如何声明较大/较小数组的数组大小?)

我不确定我是否正确,但是您想要 array.length 吗?

因此,我认为即使不实现就地排序,也可以显着提高性能。

I agree that the reason is unnecessary copying. Some more notes follow.

The choice of pivot index is bad, but it's not an issue here, because your numbers are random.

(int)Math.floor(d.size()/2.0) is equivalent to d.size()/2.

data.remove(pivotIndex); is unnecessary copying of n/2 elements. Instead, you should check in the following loop whether i == pivotIndex and skip this element. (Well, what you really need to do is inplace sort, but I'm just suggesting straightforward improvements.)

Putting all elements that are equal to pivot in the same ('smaller') part is a bad idea. Imagine what happens when all elements of the array are equal. (Again, not an issue in this case.)


for(i = 0; i < s.size(); i++){
    arr.add(s.get(i));
}

is equivalent to arr.addAll(s). And of course, unnecessary copying here again. You could just add all elements from the right part to the left one instead of creating new list.

(How do you declare the size of the array for largers/smallers arrays?)

I'm not sure if I got you right, but do you want array.length?

So, I think that even without implementing in-place sort you can significantly improve performance.

孤凫 2024-10-21 07:43:22

从技术上讲,合并排序比快速排序(θ(n^2)最坏情况,θ( nlogn)平均情况)。因此,很有可能找到合并排序优于快速排序的输入。根据您选择支点的方式,您可以减少最坏情况的发生。但对于快速排序的简单版本,“最坏情况”将是排序(或接近排序)的数据,这可能是相当常见的输入。

以下是维基百科对两者的描述

在典型的现代建筑中,
高效的快速排序实现
通常优于合并排序
对基于 RAM 的数组进行排序。另一方面
一方面,合并排序是一种稳定排序,
并行性更好,并且更多
有效处理访问缓慢
顺序媒体。[需要引用]
归并排序往往是最好的选择
用于对链表进行排序:在此
情况比较容易
以这种方式实现合并排序
它只需要额外的 θ(1)
空间和缓慢的随机访问
链表的性能使得
其他一些算法(例如
快速排序)表现不佳,而其他
(如堆排序)完全
不可能。

Technically, Mergesort has a better time-behavior ( Θ(nlogn) worst and average cases ) than Quicksort ( Θ(n^2) worst case, Θ(nlogn) average case). So it is quite possible to find inputs for which Mergesort outperforms Quicksort. Depending on how you pick your pivots, you can make the worst-case rare. But for a simple version of Quicksort, the "worst case" will be sorted (or nearly sorted) data, which can be a rather common input.

Here's what Wikipedia says about the two:

On typical modern architectures,
efficient quicksort implementations
generally outperform mergesort for
sorting RAM-based arrays. On the other
hand, merge sort is a stable sort,
parallelizes better, and is more
efficient at handling slow-to-access
sequential media.[citation needed]
Merge sort is often the best choice
for sorting a linked list: in this
situation it is relatively easy to
implement a merge sort in such a way
that it requires only Θ(1) extra
space, and the slow random-access
performance of a linked list makes
some other algorithms (such as
quicksort) perform poorly, and others
(such as heapsort) completely
impossible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文