改进快速排序

发布于 2024-08-11 03:03:18 字数 1055 浏览 4 评论 0原文

如果可能,我如何改进以下快速排序(性能方面)。有什么建议吗?

void main()
    {
      quick(a,0,n-1);
    }

    void quick(int a[],int lower,int upper)
    {
       int loc;
       if(lower<upper)
       {
        loc=partition(a,lower,upper);
        quick(a,lower,loc-1);
        quick(a,loc+1,upper);

       }
    }

    /* Return type: int
      Parameters passed: Unsorted array and its lower and upper bounds */

    int partition(int a[],int lower,int upper)
    {
      int pivot,i,j,temp;
      pivot=a[lower];
      i=lower+1;
      j=upper;
      while(i<j)
        {
            while((i<upper)&&(a[i]<=pivot))
            i++;
            while((a[j]>pivot))
            j--;
            if(i<j)
                {
                    temp=a[i];
                    a[i]=a[j];
                    a[j]=temp;
                }

        }//end while

        if(pivot>a[j])
        {
             temp=a[j];
             a[j]=a[lower];
             a[lower]=temp;
        }

         return(j);

}//end partition

If possible, how can I improve the following quick sort(performance wise). Any suggestions?

void main()
    {
      quick(a,0,n-1);
    }

    void quick(int a[],int lower,int upper)
    {
       int loc;
       if(lower<upper)
       {
        loc=partition(a,lower,upper);
        quick(a,lower,loc-1);
        quick(a,loc+1,upper);

       }
    }

    /* Return type: int
      Parameters passed: Unsorted array and its lower and upper bounds */

    int partition(int a[],int lower,int upper)
    {
      int pivot,i,j,temp;
      pivot=a[lower];
      i=lower+1;
      j=upper;
      while(i<j)
        {
            while((i<upper)&&(a[i]<=pivot))
            i++;
            while((a[j]>pivot))
            j--;
            if(i<j)
                {
                    temp=a[i];
                    a[i]=a[j];
                    a[j]=temp;
                }

        }//end while

        if(pivot>a[j])
        {
             temp=a[j];
             a[j]=a[lower];
             a[lower]=temp;
        }

         return(j);

}//end partition

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

剩一世无双 2024-08-18 03:03:18

快速排序算法很容易并行化。如果您有多个核心可供使用,您可能会看到相当多的速度提升。根据您的数据集有多大,它可以轻松地为您提供比任何其他优化更高的速度。但是,如果您只有一个处理器或相对较小的数据集,则不会有太大的加速。

如果有可能的话,我可以详细说明。

The quicksort algorithm is easily parallelized. If you have multiple cores to work with, you could see quite a bit of speed up. Depending on how large your data set is, it could easily provide you with more speed up than any other optimization. However, if you have only one processor or a relatively small data set, there won't be much of a speed up.

I could elaborate more if this is a possibility.

我是有多爱你 2024-08-18 03:03:18
  1. 选择一个更好的枢轴:例如。在三中位数中,您选择 3 个(随机)元素并选择枢轴作为中位数元素
  2. 当 length(a[]) < M(实际中选择M=9)停止排序。 qsort() 完成后应用插入排序,这大约需要 M * N = O(N)。这避免了靠近分而治递归树的叶子的许多函数调用。
  1. Choose a better pivot: eg. in median-of-three you pick 3 (random) elements and choose the pivot as the median element
  2. When length(a[]) < M (in practice choose M = 9) stop sorting. After qsort() finished apply insert sort which would take roughly M * N = O(N). This avoids many function calls close to leaf of the divide-et-impera recursion tree.
装纯掩盖桑 2024-08-18 03:03:18

第一个建议是:用迭代替换其中一个递归调用。我的意思是真正的迭代,而不是手动实现的递归堆栈。即,不要从 quickquick 进行两次“新”调用,而是“回收”对 quick当前调用处理递归的一个分支,并递归调用quick来处理另一个分支。

现在,如果您确保始终对较短的分区进行真正的递归调用(并对较长的分区使用迭代),它将保证递归深度永远不会超过 log N 即使在最坏的情况下,即无论您选择中位数的效果如何。

这一切都是在 GCC 标准库附带的 qsort 算法中实现的。看一下源码,应该有用。

粗略地说,遵循上述建议的更实际的实现可能如下所示。

void quick(int a[], int lower, int upper)
{
  while (lower < upper)
  {
    int loc = partition(a, lower, upper);
    if (loc - lower < upper - loc)
    { /* Lower part is shorter... */
      quick(a, lower, loc - 1); /* ...process it recursively... */
      lower = loc + 1; /* ...and process the upper part on the next iteration */
    }
    else
    { /* Upper part is shorter... */
      quick(a, loc + 1, upper); /* ...process it recursively... */
      upper = loc - 1; /* ...and process the lower part on the next iteration */
    }
  }
}

当然,这只是这个想法的一个草图。未测试。再次看一下 GCC 的实现,了解同样的想法。他们还将剩余的递归调用替换为“手动”递归,但这并不是真正必要的。

The first suggestion would be: replace one of the recursive calls with iteration. And I mean real iteration, not a manually implemented stack for recursion. I.e. instead of making two "new" calls to quick from quick, "recycle" your current call to quick to process one branch of recursion, and call quick recursively to process another branch.

Now, if you make sure that you always make real recursive call for the shorter partition (and use iteration for the longer one), it will guarantee that the depth of recursion will never exceed log N even in the worst case, i.e. regardless of how well you choose your median.

This all is implemented in qsort algorithm that comes with GCC standard library. Take a look at the source, it should be useful.

Roughly, a more practical implementation that follows the above suggestion might look as follows

void quick(int a[], int lower, int upper)
{
  while (lower < upper)
  {
    int loc = partition(a, lower, upper);
    if (loc - lower < upper - loc)
    { /* Lower part is shorter... */
      quick(a, lower, loc - 1); /* ...process it recursively... */
      lower = loc + 1; /* ...and process the upper part on the next iteration */
    }
    else
    { /* Upper part is shorter... */
      quick(a, loc + 1, upper); /* ...process it recursively... */
      upper = loc - 1; /* ...and process the lower part on the next iteration */
    }
  }
}

This is just a sketch of the idea, of course. Not tested. Again, take a look at GCC implementation for the same idea. They also replace the remaining recursive call with "manual" recursion, but it is not really necessary.

花心好男孩 2024-08-18 03:03:18

使用无循环算法对小块(<5个元素)进行排序可以提高性能。我只找到了一个如何为 5 个元素编写无循环排序算法的示例: http://wiki.tcl.tk/ 4118

这个想法可用于为 C 中的 2,3,4,5 元素生成无循环排序算法。

编辑:我在一组数据上尝试过,与中的代码相比,我测量了 87% 的运行时间的问题。在 <20 个块上使用插入排序在同一数据集上的结果为 92%。此测量不具有代表性,但可能表明这是改进快速排序代码的一种方法。

编辑:此示例代码写出 2-6 个元素的无循环排序函数:

#include <stdio.h>

#define OUT(i,fmt,...)  printf("%*.s"fmt"\n",i,"",##__VA_ARGS__);

#define IF( a,b, i, block1, block2 ) \
  OUT(i,"if(t[%i]>t[%i]){",a,b) block1 OUT(i,"}else{") block2 OUT(i,"}")

#define AB2(i,a,b)         IF(a,b,i,P2(i+2,b,a),P2(i+2,a,b))
#define  P2(i,a,b)         print_perm(i,2,(int[2]){a,b});

#define AB3(i,a,b,c)       IF(a,b,i,BC3(i+2,b,a,c),BC3(i+2,a,b,c))
#define AC3(i,a,b,c)       IF(a,c,i, P3(i+2,c,a,b), P3(i+2,a,c,b))
#define BC3(i,a,b,c)       IF(b,c,i,AC3(i+2,a,b,c), P3(i+2,a,b,c))
#define  P3(i,a,b,c)       print_perm(i,3,(int[3]){a,b,c});

#define AB4(i,a,b,c,d)     IF(a,b,i,CD4(i+2,b,a,c,d),CD4(i+2,a,b,c,d))
#define AC4(i,a,b,c,d)     IF(a,c,i, P4(i+2,c,a,b,d), P4(i+2,a,c,b,d))
#define BC4(i,a,b,c,d)     IF(b,c,i,AC4(i+2,a,b,c,d), P4(i+2,a,b,c,d))
#define BD4(i,a,b,c,d)     IF(b,d,i,BC4(i+2,c,d,a,b),BC4(i+2,a,b,c,d))
#define CD4(i,a,b,c,d)     IF(c,d,i,BD4(i+2,a,b,d,c),BD4(i+2,a,b,c,d))
#define  P4(i,a,b,c,d)     print_perm(i,4,(int[4]){a,b,c,d});

#define AB5(i,a,b,c,d,e)   IF(a,b,i,CD5(i+2,b,a,c,d,e),CD5(i+2,a,b,c,d,e))
#define AC5(i,a,b,c,d,e)   IF(a,c,i, P5(i+2,c,a,b,d,e), P5(i+2,a,c,b,d,e))
#define AE5(i,a,b,c,d,e)   IF(a,e,i,CB5(i+2,e,a,c,b,d),CB5(i+2,a,e,c,b,d))
#define BE5(i,a,b,c,d,e)   IF(b,e,i,AE5(i+2,a,b,c,d,e),DE5(i+2,a,b,c,d,e))
#define BD5(i,a,b,c,d,e)   IF(b,d,i,BE5(i+2,c,d,a,b,e),BE5(i+2,a,b,c,d,e))
#define CB5(i,a,b,c,d,e)   IF(c,b,i,DC5(i+2,a,b,c,d,e),AC5(i+2,a,b,c,d,e))
#define CD5(i,a,b,c,d,e)   IF(c,d,i,BD5(i+2,a,b,d,c,e),BD5(i+2,a,b,c,d,e))
#define DC5(i,a,b,c,d,e)   IF(d,c,i, P5(i+2,a,b,c,d,e), P5(i+2,a,b,d,c,e))
#define DE5(i,a,b,c,d,e)   IF(d,e,i,CB5(i+2,a,b,c,e,d),CB5(i+2,a,b,c,d,e))
#define  P5(i,a,b,c,d,e)   print_perm(i,5,(int[5]){a,b,c,d,e});

#define AB6(i,a,b,c,d,e,f) IF(a,b,i,CD6(i+2,b,a,c,d,e,f),CD6(i+2,a,b,c,d,e,f))
#define AC6(i,a,b,c,d,e,f) IF(a,c,i, P6(i+2,c,a,b,d,e,f), P6(i+2,a,c,b,d,e,f))
#define AE6(i,a,b,c,d,e,f) IF(a,e,i,CB6(i+2,e,a,c,b,d,f),CB6(i+2,a,e,c,b,d,f))
#define BD6(i,a,b,c,d,e,f) IF(b,d,i,DF6(i+2,c,d,a,b,e,f),DF6(i+2,a,b,c,d,e,f))
#define BE6(i,a,b,c,d,e,f) IF(b,e,i,AE6(i+2,a,b,c,d,e,f),DE6(i+2,a,b,c,d,e,f))
#define CB6(i,a,b,c,d,e,f) IF(c,b,i,DC6(i+2,a,b,c,d,e,f),AC6(i+2,a,b,c,d,e,f))
#define CD6(i,a,b,c,d,e,f) IF(c,d,i,EF6(i+2,a,b,d,c,e,f),EF6(i+2,a,b,c,d,e,f))
#define DB6(i,a,b,c,d,e,f) IF(d,b,i,BE6(i+2,a,b,c,d,e,f),BE6(i+2,c,d,a,b,e,f))
#define DC6(i,a,b,c,d,e,f) IF(d,c,i, P6(i+2,a,b,c,d,e,f), P6(i+2,a,b,d,c,e,f))
#define DE6(i,a,b,c,d,e,f) IF(d,e,i,CB6(i+2,a,b,c,e,d,f),CB6(i+2,a,b,c,d,e,f))
#define DF6(i,a,b,c,d,e,f) IF(d,f,i,DB6(i+2,a,b,e,f,c,d),BE6(i+2,a,b,c,d,e,f))
#define EF6(i,a,b,c,d,e,f) IF(e,f,i,BD6(i+2,a,b,c,d,f,e),BD6(i+2,a,b,c,d,e,f))
#define  P6(i,a,b,c,d,e,f) print_perm(i,6,(int[6]){a,b,c,d,e,f});

#define SORT(ABn,n,...) \
  OUT( 0, ""); \
  OUT( 0, "inline void sort" #n "( int t[" #n "] ){" ) \
  OUT( 2, "int tmp;" ) \
  ABn( 2, __VA_ARGS__ ) \
  OUT( 0, "}" )

void print_perm( int ind, int n, int t[n] ){
  printf("%*.s", ind-1, "");
  for( int i=0; i<n; i++ )
    if( i != t[i] ){
      printf(" tmp=t[%i]; t[%i]=",i,i);
      for( int j=t[i],k; j!=i; k=j,j=t[j],t[k]=k)
        printf("t[%i]; t[%i]=",j,j);
      printf("tmp;");
    }
  printf("\n");
}

int main( void ){
  SORT( AB2, 2, 0,1 )
  SORT( AB3, 3, 0,1,2 )
  SORT( AB4, 4, 0,1,2,3 )
  SORT( AB5, 5, 0,1,2,3,4 )
  SORT( AB6, 6, 0,1,2,3,4,5 )
}

生成的代码长 3718 行:

sort2():    8
sort3():   24
sort4():   96
sort5():  512
sort6(): 3072

Sorting small blocks (<5 elements) with a loopless algorithm may improve performance. I found only one example how to write a loopless sorting algorithm for 5 elements: http://wiki.tcl.tk/4118

The idea can be used to generate loopless sorting algorithms for 2,3,4,5 elements in C.

EDIT: i tried it on one set of data, and i measured 87% running time compared to the code in the question. Using insertion sort on <20 blocks resulted 92% on the same data set. This measurement is not representative but may show that this is a way how can You improve Your quicksort code.

EDIT: this example code write out loopless sorting functions for 2-6 elements:

#include <stdio.h>

#define OUT(i,fmt,...)  printf("%*.s"fmt"\n",i,"",##__VA_ARGS__);

#define IF( a,b, i, block1, block2 ) \
  OUT(i,"if(t[%i]>t[%i]){",a,b) block1 OUT(i,"}else{") block2 OUT(i,"}")

#define AB2(i,a,b)         IF(a,b,i,P2(i+2,b,a),P2(i+2,a,b))
#define  P2(i,a,b)         print_perm(i,2,(int[2]){a,b});

#define AB3(i,a,b,c)       IF(a,b,i,BC3(i+2,b,a,c),BC3(i+2,a,b,c))
#define AC3(i,a,b,c)       IF(a,c,i, P3(i+2,c,a,b), P3(i+2,a,c,b))
#define BC3(i,a,b,c)       IF(b,c,i,AC3(i+2,a,b,c), P3(i+2,a,b,c))
#define  P3(i,a,b,c)       print_perm(i,3,(int[3]){a,b,c});

#define AB4(i,a,b,c,d)     IF(a,b,i,CD4(i+2,b,a,c,d),CD4(i+2,a,b,c,d))
#define AC4(i,a,b,c,d)     IF(a,c,i, P4(i+2,c,a,b,d), P4(i+2,a,c,b,d))
#define BC4(i,a,b,c,d)     IF(b,c,i,AC4(i+2,a,b,c,d), P4(i+2,a,b,c,d))
#define BD4(i,a,b,c,d)     IF(b,d,i,BC4(i+2,c,d,a,b),BC4(i+2,a,b,c,d))
#define CD4(i,a,b,c,d)     IF(c,d,i,BD4(i+2,a,b,d,c),BD4(i+2,a,b,c,d))
#define  P4(i,a,b,c,d)     print_perm(i,4,(int[4]){a,b,c,d});

#define AB5(i,a,b,c,d,e)   IF(a,b,i,CD5(i+2,b,a,c,d,e),CD5(i+2,a,b,c,d,e))
#define AC5(i,a,b,c,d,e)   IF(a,c,i, P5(i+2,c,a,b,d,e), P5(i+2,a,c,b,d,e))
#define AE5(i,a,b,c,d,e)   IF(a,e,i,CB5(i+2,e,a,c,b,d),CB5(i+2,a,e,c,b,d))
#define BE5(i,a,b,c,d,e)   IF(b,e,i,AE5(i+2,a,b,c,d,e),DE5(i+2,a,b,c,d,e))
#define BD5(i,a,b,c,d,e)   IF(b,d,i,BE5(i+2,c,d,a,b,e),BE5(i+2,a,b,c,d,e))
#define CB5(i,a,b,c,d,e)   IF(c,b,i,DC5(i+2,a,b,c,d,e),AC5(i+2,a,b,c,d,e))
#define CD5(i,a,b,c,d,e)   IF(c,d,i,BD5(i+2,a,b,d,c,e),BD5(i+2,a,b,c,d,e))
#define DC5(i,a,b,c,d,e)   IF(d,c,i, P5(i+2,a,b,c,d,e), P5(i+2,a,b,d,c,e))
#define DE5(i,a,b,c,d,e)   IF(d,e,i,CB5(i+2,a,b,c,e,d),CB5(i+2,a,b,c,d,e))
#define  P5(i,a,b,c,d,e)   print_perm(i,5,(int[5]){a,b,c,d,e});

#define AB6(i,a,b,c,d,e,f) IF(a,b,i,CD6(i+2,b,a,c,d,e,f),CD6(i+2,a,b,c,d,e,f))
#define AC6(i,a,b,c,d,e,f) IF(a,c,i, P6(i+2,c,a,b,d,e,f), P6(i+2,a,c,b,d,e,f))
#define AE6(i,a,b,c,d,e,f) IF(a,e,i,CB6(i+2,e,a,c,b,d,f),CB6(i+2,a,e,c,b,d,f))
#define BD6(i,a,b,c,d,e,f) IF(b,d,i,DF6(i+2,c,d,a,b,e,f),DF6(i+2,a,b,c,d,e,f))
#define BE6(i,a,b,c,d,e,f) IF(b,e,i,AE6(i+2,a,b,c,d,e,f),DE6(i+2,a,b,c,d,e,f))
#define CB6(i,a,b,c,d,e,f) IF(c,b,i,DC6(i+2,a,b,c,d,e,f),AC6(i+2,a,b,c,d,e,f))
#define CD6(i,a,b,c,d,e,f) IF(c,d,i,EF6(i+2,a,b,d,c,e,f),EF6(i+2,a,b,c,d,e,f))
#define DB6(i,a,b,c,d,e,f) IF(d,b,i,BE6(i+2,a,b,c,d,e,f),BE6(i+2,c,d,a,b,e,f))
#define DC6(i,a,b,c,d,e,f) IF(d,c,i, P6(i+2,a,b,c,d,e,f), P6(i+2,a,b,d,c,e,f))
#define DE6(i,a,b,c,d,e,f) IF(d,e,i,CB6(i+2,a,b,c,e,d,f),CB6(i+2,a,b,c,d,e,f))
#define DF6(i,a,b,c,d,e,f) IF(d,f,i,DB6(i+2,a,b,e,f,c,d),BE6(i+2,a,b,c,d,e,f))
#define EF6(i,a,b,c,d,e,f) IF(e,f,i,BD6(i+2,a,b,c,d,f,e),BD6(i+2,a,b,c,d,e,f))
#define  P6(i,a,b,c,d,e,f) print_perm(i,6,(int[6]){a,b,c,d,e,f});

#define SORT(ABn,n,...) \
  OUT( 0, ""); \
  OUT( 0, "inline void sort" #n "( int t[" #n "] ){" ) \
  OUT( 2, "int tmp;" ) \
  ABn( 2, __VA_ARGS__ ) \
  OUT( 0, "}" )

void print_perm( int ind, int n, int t[n] ){
  printf("%*.s", ind-1, "");
  for( int i=0; i<n; i++ )
    if( i != t[i] ){
      printf(" tmp=t[%i]; t[%i]=",i,i);
      for( int j=t[i],k; j!=i; k=j,j=t[j],t[k]=k)
        printf("t[%i]; t[%i]=",j,j);
      printf("tmp;");
    }
  printf("\n");
}

int main( void ){
  SORT( AB2, 2, 0,1 )
  SORT( AB3, 3, 0,1,2 )
  SORT( AB4, 4, 0,1,2,3 )
  SORT( AB5, 5, 0,1,2,3,4 )
  SORT( AB6, 6, 0,1,2,3,4,5 )
}

The generated code 3718 lines long:

sort2():    8
sort3():   24
sort4():   96
sort5():  512
sort6(): 3072
平安喜乐 2024-08-18 03:03:18

尝试另一种排序算法。

根据您的数据,您也许可以用内存来换取速度。

根据维基百科

  • 快速排序有一个最佳情况O(n log n) 性能和O(1)空间
  • 合并排序有一个固定 O(n log n) 性能和 O(n) 空间
  • 基数排序具有固定 O(n . <位数>) 性能和 O(n . <位数>) 空间

< em>编辑

显然你的数据是整数。对于 [0, 0x0fffffff] 范围内的 250 万个整数,我的基数排序实现速度大约是您的快速排序实现速度的 4 倍。

$ ./a.out
qsort time: 0.39 secs
radix time: 0.09 secs
good: 2000; evil: 0
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ARRAY_SIZE 2560000
#define RANDOM_NUMBER (((rand() << 16) + rand()) & 0x0fffffff)

int partition(int a[], int lower, int upper) {
  int pivot, i, j, temp;
  pivot = a[lower];
  i = lower + 1;
  j = upper;
  while (i < j) {
    while((i < upper) && (a[i] <= pivot)) i++;
    while (a[j] > pivot) j--;
    if (i < j) {
      temp = a[i];
      a[i] = a[j];
      a[j] = temp;
    }
  }
  if (pivot > a[j]) {
    temp = a[j];
    a[j] = a[lower];
    a[lower] = temp;
  }
  return j;
}

void quick(int a[], int lower, int upper) {
  int loc;
  if (lower < upper) {
    loc = partition(a, lower, upper);
    quick(a, lower, loc-1);
    quick(a, loc+1, upper);
  }
}

#define NBUCKETS 256
#define BUCKET_SIZE (48 * (1 + ARRAY_SIZE / NBUCKETS))

/* "waste" memory */
int bucket[NBUCKETS][BUCKET_SIZE];

void radix(int *a, size_t siz) {
  unsigned shift = 0;
  for (int dummy=0; dummy<4; dummy++) {
    int bcount[NBUCKETS] = {0};
    int *aa = a;
    size_t s = siz;
    while (s--) {
      unsigned v = ((unsigned)*aa >> shift) & 0xff;
      if (bcount[v] == BUCKET_SIZE) {
        fprintf(stderr, "not enough memory.\n");
        fprintf(stderr, "v == %u; bcount[v] = %d.\n", v, bcount[v]);
        exit(EXIT_FAILURE);
      }
      bucket[v][bcount[v]++] = *aa++;
    }
    aa = a;
    for (int k=0; k<NBUCKETS; k++) {
      for (int j=0; j<bcount[k]; j++) *aa++ = bucket[k][j];
    }
    shift += 8;
  }
}

int ar1[ARRAY_SIZE];
int ar2[ARRAY_SIZE];

int main(void) {
  clock_t t0;

  srand(time(0));
  for (int k=0; k<ARRAY_SIZE; k++) ar1[k] = ar2[k] = RANDOM_NUMBER;

  t0 = clock(); while (clock() == t0) /* void */; t0 = clock();
  do {
    quick(ar1, 0, ARRAY_SIZE - 1);
  } while (0);
  double qsort_time = (double)(clock() - t0) / CLOCKS_PER_SEC;

  t0 = clock(); while (clock() == t0) /* void */; t0 = clock();
  do {
    radix(ar2, ARRAY_SIZE);
  } while (0);
  double radix_time = (double)(clock() - t0) / CLOCKS_PER_SEC;

  printf("qsort time: %.2f secs\n", qsort_time);
  printf("radix time: %.2f secs\n", radix_time);

  /* compare sorted arrays by sampling */
  int evil = 0;
  int good = 0;
  for (int chk=0; chk<2000; chk++) {
    size_t index = RANDOM_NUMBER % ARRAY_SIZE;
    if (ar1[index] == ar2[index]) good++;
    else evil++;
  }
  printf("good: %d; evil: %d\n", good, evil);

  return 0;
}

Try another sort algorithm.

Depending on your data, you may be able to trade memory for speed.

According to Wikipedia

  • Quick sort has a best case O(n log n) performance and O(1) space
  • Merge sort has a fixed O(n log n) performance and O(n) space
  • Radix sort has a fixed O(n . <number of digits>) perfomance and O(n . <number of digits>) space

Edit

Apparently your data is integers. With 2.5M integers in the range [0, 0x0fffffff], my implementation of radix-sort is about 4 times as fast as your implementation of quick-sort.

$ ./a.out
qsort time: 0.39 secs
radix time: 0.09 secs
good: 2000; evil: 0
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ARRAY_SIZE 2560000
#define RANDOM_NUMBER (((rand() << 16) + rand()) & 0x0fffffff)

int partition(int a[], int lower, int upper) {
  int pivot, i, j, temp;
  pivot = a[lower];
  i = lower + 1;
  j = upper;
  while (i < j) {
    while((i < upper) && (a[i] <= pivot)) i++;
    while (a[j] > pivot) j--;
    if (i < j) {
      temp = a[i];
      a[i] = a[j];
      a[j] = temp;
    }
  }
  if (pivot > a[j]) {
    temp = a[j];
    a[j] = a[lower];
    a[lower] = temp;
  }
  return j;
}

void quick(int a[], int lower, int upper) {
  int loc;
  if (lower < upper) {
    loc = partition(a, lower, upper);
    quick(a, lower, loc-1);
    quick(a, loc+1, upper);
  }
}

#define NBUCKETS 256
#define BUCKET_SIZE (48 * (1 + ARRAY_SIZE / NBUCKETS))

/* "waste" memory */
int bucket[NBUCKETS][BUCKET_SIZE];

void radix(int *a, size_t siz) {
  unsigned shift = 0;
  for (int dummy=0; dummy<4; dummy++) {
    int bcount[NBUCKETS] = {0};
    int *aa = a;
    size_t s = siz;
    while (s--) {
      unsigned v = ((unsigned)*aa >> shift) & 0xff;
      if (bcount[v] == BUCKET_SIZE) {
        fprintf(stderr, "not enough memory.\n");
        fprintf(stderr, "v == %u; bcount[v] = %d.\n", v, bcount[v]);
        exit(EXIT_FAILURE);
      }
      bucket[v][bcount[v]++] = *aa++;
    }
    aa = a;
    for (int k=0; k<NBUCKETS; k++) {
      for (int j=0; j<bcount[k]; j++) *aa++ = bucket[k][j];
    }
    shift += 8;
  }
}

int ar1[ARRAY_SIZE];
int ar2[ARRAY_SIZE];

int main(void) {
  clock_t t0;

  srand(time(0));
  for (int k=0; k<ARRAY_SIZE; k++) ar1[k] = ar2[k] = RANDOM_NUMBER;

  t0 = clock(); while (clock() == t0) /* void */; t0 = clock();
  do {
    quick(ar1, 0, ARRAY_SIZE - 1);
  } while (0);
  double qsort_time = (double)(clock() - t0) / CLOCKS_PER_SEC;

  t0 = clock(); while (clock() == t0) /* void */; t0 = clock();
  do {
    radix(ar2, ARRAY_SIZE);
  } while (0);
  double radix_time = (double)(clock() - t0) / CLOCKS_PER_SEC;

  printf("qsort time: %.2f secs\n", qsort_time);
  printf("radix time: %.2f secs\n", radix_time);

  /* compare sorted arrays by sampling */
  int evil = 0;
  int good = 0;
  for (int chk=0; chk<2000; chk++) {
    size_t index = RANDOM_NUMBER % ARRAY_SIZE;
    if (ar1[index] == ar2[index]) good++;
    else evil++;
  }
  printf("good: %d; evil: %d\n", good, evil);

  return 0;
}
人│生佛魔见 2024-08-18 03:03:18
  1. 您可以通过使用带有显式堆栈的快速排序来消除递归开销

    void fastSort(int a[], int lower, int upper)
    {
        创建堆栈(); 
        推(下); 
        推(上);
    
        while (!isEmptyStack()) {
        上层=poptop();
        下=弹出窗口();
           而(下<上){
                     pivPos=分区(selectPivot(a, 大小), a, 下, 上);
                     推(下);
                     推(pivPos-1);
                     下=pivPos+1; //结束=结束;
           }
       }
    }
    
  2. 您可以使用更好的主元选择技术,例如:

    1. 3 的中位数
    2. 中位数的中位数
    3. 随机主元
  1. You can a eliminate recuriosn overhead by using QuickSort with Explicit Stack

    void quickSort(int a[], int lower, int upper)
    {
        createStack(); 
        push(lower); 
        push(upper);
    
        while (!isEmptyStack()) {
        upper=poptop();
        lower=poptop();
           while (lower<upper) {
                     pivPos=partition(selectPivot(a, size), a, lower, upper);
                     push(lower);
                     push(pivPos-1);
                     lower = pivPos+1; // end = end;
           }
       }
    }
    
  2. You can use better pivot selection technique such as:

    1. median of 3
    2. median of medians
    3. random pivot
时间海 2024-08-18 03:03:18

目前广泛使用的最先进的快速排序是在Java的DualPivotQuicksort.java
因此,您可以简单地遵循该方法,您将看到很好的性能改进:

  • 对小型数组使用插入排序(47 是 Java 中使用的数字)
  • 使用双枢轴快速排序,选择 5 的第二个和第四个元素作为两个枢轴
  • 考虑对带有排序数字的数组使用归并排序

或者如果您想增加一点复杂性,则编写 3-pivot 快速排序

Currently the most advanced quicksort widely used is implemented in Java's DualPivotQuicksort.java
So you can simply follow that approach and you will see a nice performance improvement:

  • Use Insertion sort for small arrays (47 is the number used in Java)
  • Use that dual-pivot quicksort choosing the 2nd and 4th elements of 5 as the two pivots
  • Consider using mergesort for arrays with runs of sorted numbers

Or if you want to add a bit more complexity then code a 3-pivot quicksort.

放肆 2024-08-18 03:03:18

如果这不仅仅是为了学习,请使用 stdlib.h 中的 qsort

If this is not just for learning use qsort from stdlib.h

晚雾 2024-08-18 03:03:18

根据您的代码,当排序长度为 10 时,最深的堆栈看起来像

#0  partition (a=0x7fff5ac42180, lower=3, upper=5) 
#1  0x000000000040062f in quick (a=0x7fff5ac42180, lower=3, upper=5) 
#2  0x0000000000400656 in quick (a=0x7fff5ac42180, lower=0, upper=5) 
#3  0x0000000000400644 in quick (a=0x7fff5ac42180, lower=0, upper=8) 
#4  0x0000000000400644 in quick (a=0x7fff5ac42180, lower=0, upper=9) 
#5  0x00000000004005c3 in main 

除了算法本身之外,如果我们还考虑堆栈活动等系统行为,除了正常调用堆栈成本(推送/弹出)之外的其他内容可能会降低性能很多,例如上下文切换,如果多任务系统,您知道大多数操作系统都是多任务的,并且堆栈越深,切换成本越高,加上缓存未命中或跨缓存线边界。

我相信在这种情况下,如果长度变大,与其他一些排序算法相比,你就会失败。

例如,当长度达到 40 时,堆栈可能看起来像(可能比下面显示的条目更多):

#0  partition (a=0x7fff24810cd0, lower=8, upper=9) 
#1  0x000000000040065d in quick (a=0x7fff24810cd0, lower=8, upper=9)  
#2  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=8, upper=10) 
#3  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=8, upper=14) 
#4  0x0000000000400684 in quick (a=0x7fff24810cd0, lower=4, upper=14) 
#5  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=4, upper=18) 
#6  0x0000000000400684 in quick (a=0x7fff24810cd0, lower=0, upper=18) 
#7  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=26) 
#8  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=38) 
#9  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=39) 
#10 0x00000000004005ee in main  

堆栈太深。

函数内联也很有帮助,但它增加了最终可执行文件的代码占用量。
我同意@Swiss,并行编程可能对你有很大帮助。

Per your code, when sorted lenght is 10, the deepest stack looks like

#0  partition (a=0x7fff5ac42180, lower=3, upper=5) 
#1  0x000000000040062f in quick (a=0x7fff5ac42180, lower=3, upper=5) 
#2  0x0000000000400656 in quick (a=0x7fff5ac42180, lower=0, upper=5) 
#3  0x0000000000400644 in quick (a=0x7fff5ac42180, lower=0, upper=8) 
#4  0x0000000000400644 in quick (a=0x7fff5ac42180, lower=0, upper=9) 
#5  0x00000000004005c3 in main 

Besides the algo itself, if we consider the system behavior such as stack activities as well, something else except normal call stack costs (push/pop) might downgrade the performance a lots, e.g. context switching if multi-task system, you know most OS are multi-task, and deeper the stack higher costs the switching, plus cache miss or cachline boundary across.

I believe in this case if the length become bigger, you'll lose when comparing to some other sorting algos.

for example, when length is up to 40, the stack may look like (may more entries than shown as below):

#0  partition (a=0x7fff24810cd0, lower=8, upper=9) 
#1  0x000000000040065d in quick (a=0x7fff24810cd0, lower=8, upper=9)  
#2  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=8, upper=10) 
#3  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=8, upper=14) 
#4  0x0000000000400684 in quick (a=0x7fff24810cd0, lower=4, upper=14) 
#5  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=4, upper=18) 
#6  0x0000000000400684 in quick (a=0x7fff24810cd0, lower=0, upper=18) 
#7  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=26) 
#8  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=38) 
#9  0x0000000000400672 in quick (a=0x7fff24810cd0, lower=0, upper=39) 
#10 0x00000000004005ee in main  

too deeper the stack.

Function inlining is helpful too, but it increases code footprint of the final executable.
I agree with @Swiss, parallel programming might help you so much.

穿越时光隧道 2024-08-18 03:03:18

完全愚蠢的答案,但是......在发布模式下编译代码并打开优化!

completely dumb answer, but... compile your code in release mode and turn on optimizations !

马蹄踏│碎落叶 2024-08-18 03:03:18

首先要做的就是对其进行基准测试。并将其与标准 qsort、c++ std::sort 和 std::stable_sort 进行基准测试。

如果您的数据集足够大,您的结果将表明 std::sort 在所有情况下都优于 qsort,除了 std::stable_sort 优于的情况。这是因为 std::sort 是模板化的,因此可以内联比较。

您的代码内联比较,因为它不是通用的。如果你的代码比 qsort 慢,那么你就有问题了。

更快的排序是并行对各个部分进行排序,例如 openmp,然后将它们合并在一起。

The first thing to do is benchmark it. And benchmark it against the standard qsort, and against the c++ std::sort and std::stable_sort.

Your results, should your data-set be big enough, will show that the std::sort is superior to qsort in all cases except those where the std::stable_sort is superior. This is because the std::sort is templated, and therefore the comparision can be inlined.

Your code inlines the comparison because its not generic. If your code is slower than qsort, you have a problem right there.

The faster sort would be to sort parts in parallel, e.g. openmp, and then merge them back together.

亚希 2024-08-18 03:03:18

从我的回答中复制来回答问题。

编辑:本文假设您已经做了一些显而易见的事情,例如利用尾递归来消除不必要的调用开销。

人们喜欢批评快速排序在某些输入上表现不佳,尤其是当用户可以控制输入时。以下方法产生与中点选择匹配的性能,但随着列表大小的增长,预期复杂度呈指数级接近 O(n log n)。根据我的经验,由于大多数情况下的开销要少得多,因此它的性能明显优于三选一的选择。它应该在“预期”输入的中点选择下表现均匀,但不易受到不良输入的影响。

  • 使用随机正整数I初始化快速排序。该值将在整个排序过程中使用(不必生成多个数字)。
  • 枢轴被选择为I mod SectionSize

为了获得额外的性能,您应该始终将“小”列表段的快速排序切换为希尔排序 - 我已经看到选择 15-100 的长度作为截止值。

Copied from my answer to answer question.

Edit: This post assumes you already do obvious things like take advantage of tail recursion to get rid of the unnecessary call overhead.

People like to criticize the quicksort for poor performance with certain inputs, especially when the user has control of the input. The following approach yields performance matching midpoint selection but expected complexity exponentially approaches O(n log n) as the list grows in size. In my experience it significantly outperforms best-of-3 selection due to much less overhead in the majority case. It should perform evenly with midpoint selection for "expected" inputs, but is not vulnerable to poor inputs.

  • Initialize the quicksort with a random positive integer I. That value will be used throughout the sorting process (don't have to generate multiple numbers).
  • Pivot is selected as I mod SectionSize.

For additional performance, you should always switch your quicksort to shell sort for "small" list segments - I've seen lengths from 15-100 chosen as the cutoff.

扭转时空 2024-08-18 03:03:18

多线程?

/*
 * multiple-thread quick-sort.
 * Works fine on uniprocessor machines as well.
 */

#include <unistd.h>
#include <stdlib.h>
#include <thread.h>

/* don't create more threads for less than this */
#define SLICE_THRESH   4096

/* how many threads per lwp */
#define THR_PER_LWP       4

/* cast the void to a one byte quanitity and compute the offset */
#define SUB(a, n)      ((void *) (((unsigned char *) (a)) + ((n) * width)))

typedef struct {
  void    *sa_base;
  int      sa_nel;
  size_t   sa_width;
  int    (*sa_compar)(const void *, const void *);
} sort_args_t;

/* for all instances of quicksort */
static int threads_avail;

#define SWAP(a, i, j, width)
{ 
  int n; 
  unsigned char uc; 
  unsigned short us; 
  unsigned long ul; 
  unsigned long long ull; 

  if (SUB(a, i) == pivot) 
    pivot = SUB(a, j); 
  else if (SUB(a, j) == pivot) 
    pivot = SUB(a, i); 

  /* one of the more convoluted swaps I've done */ 
  switch(width) { 
  case 1: 
    uc = *((unsigned char *) SUB(a, i)); 
    *((unsigned char *) SUB(a, i)) = *((unsigned char *) SUB(a, j)); 
    *((unsigned char *) SUB(a, j)) = uc; 
    break; 
  case 2: 
    us = *((unsigned short *) SUB(a, i)); 
    *((unsigned short *) SUB(a, i)) = *((unsigned short *) SUB(a, j)); 
    *((unsigned short *) SUB(a, j)) = us; 
    break; 
  case 4: 
    ul = *((unsigned long *) SUB(a, i)); 
    *((unsigned long *) SUB(a, i)) = *((unsigned long *) SUB(a, j)); 
    *((unsigned long *) SUB(a, j)) = ul; 
    break; 
  case 8: 
    ull = *((unsigned long long *) SUB(a, i)); 
    *((unsigned long long *) SUB(a,i)) = *((unsigned long long *) SUB(a,j)); 
    *((unsigned long long *) SUB(a, j)) = ull; 
    break; 
  default: 
    for(n=0; n<width; n++) { 
      uc = ((unsigned char *) SUB(a, i))[n]; 
      ((unsigned char *) SUB(a, i))[n] = ((unsigned char *) SUB(a, j))[n]; 
      ((unsigned char *) SUB(a, j))[n] = uc; 
    } 
    break; 
  } 
}

static void *
_quicksort(void *arg)
{
  sort_args_t *sargs = (sort_args_t *) arg;
  register void *a = sargs->sa_base;
  int n = sargs->sa_nel;
  int width = sargs->sa_width;
  int (*compar)(const void *, const void *) = sargs->sa_compar;
  register int i;
  register int j;
  int z;
  int thread_count = 0;
  void *t;
  void *b[3];
  void *pivot = 0;
  sort_args_t sort_args[2];
  thread_t tid;

  /* find the pivot point */
  switch(n) {
  case 0:
  case 1:
    return 0;
  case 2:
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    return 0;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, 2), SUB(a, 1)) < 0) {
      SWAP(a, 2, 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, 1), SUB(a, 0)) < 0) {
      SWAP(a, 1, 0, width);
    }
    return 0;
  default:
    if (n > 3) {
      b[0] = SUB(a, 0);
      b[1] = SUB(a, n / 2);
      b[2] = SUB(a, n - 1);
      /* three sort */
      if ((*compar)(b[0], b[1]) > 0) {
        t = b[0];
        b[0] = b[1];
        b[1] = t;
      }
      /* the first two are now ordered, now order the second two */
      if ((*compar)(b[2], b[1]) < 0) {
        t = b[1];
        b[1] = b[2];
        b[2] = t;
      }
      /* should the second be moved to the first? */
      if ((*compar)(b[1], b[0]) < 0) {
        t = b[0];
        b[0] = b[1];
        b[1] = t;
      }
      if ((*compar)(b[0], b[2]) != 0)
        if ((*compar)(b[0], b[1]) < 0)
          pivot = b[1];
        else
          pivot = b[2];
    }
    break;
  }
  if (pivot == 0)
    for(i=1; i<n; i++) {
      if (z = (*compar)(SUB(a, 0), SUB(a, i))) {
        pivot = (z > 0) ? SUB(a, 0) : SUB(a, i);
        break;
      }
    }
  if (pivot == 0)
    return;

  /* sort */
  i = 0;
  j = n - 1;
  while(i <= j) {
    while((*compar)(SUB(a, i), pivot) < 0)
      ++i;
    while((*compar)(SUB(a, j), pivot) >= 0)
      --j;
    if (i < j) {
      SWAP(a, i, j, width);
      ++i;
      --j;
    }
  }

  /* sort the sides judiciously */
  switch(i) {
  case 0:
  case 1:
    break;
  case 2:
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    break;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, 2), SUB(a, 1)) < 0) {
      SWAP(a, 2, 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, 1), SUB(a, 0)) < 0) {
      SWAP(a, 1, 0, width);
    }
    break;
  default:
    sort_args[0].sa_base          = a;
    sort_args[0].sa_nel           = i;
    sort_args[0].sa_width         = width;
    sort_args[0].sa_compar        = compar;
    if ((threads_avail > 0) && (i > SLICE_THRESH)) {
      threads_avail--;
      thr_create(0, 0, _quicksort, &sort_args[0], 0, &tid);
      thread_count = 1;
    } else
      _quicksort(&sort_args[0]);
    break;
  }
  j = n - i;
  switch(j) {
  case 1:
    break;
  case 2:
    if ((*compar)(SUB(a, i), SUB(a, i + 1)) > 0) {
      SWAP(a, i, i + 1, width);
    }
    break;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, i), SUB(a, i + 1)) > 0) {
      SWAP(a, i, i + 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, i + 2), SUB(a, i + 1)) < 0) {
      SWAP(a, i + 2, i + 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, i + 1), SUB(a, i)) < 0) {
      SWAP(a, i + 1, i, width);
    }
    break;
  default:
    sort_args[1].sa_base          = SUB(a, i);
    sort_args[1].sa_nel           = j;
    sort_args[1].sa_width         = width;
    sort_args[1].sa_compar        = compar;
    if ((thread_count == 0) && (threads_avail > 0) && (i > SLICE_THRESH)) {
      threads_avail--;
      thr_create(0, 0, _quicksort, &sort_args[1], 0, &tid);
      thread_count = 1;
    } else
      _quicksort(&sort_args[1]);
    break;
  }
  if (thread_count) {
    thr_join(tid, 0, 0);
    threads_avail++;
  }
  return 0;
}

void
quicksort(void *a, size_t n, size_t width,
          int (*compar)(const void *, const void *))
{
  static int ncpus = -1;
  sort_args_t sort_args;

  if (ncpus == -1) {
    ncpus = sysconf( _SC_NPROCESSORS_ONLN);

    /* lwp for each cpu */
    if ((ncpus > 1) && (thr_getconcurrency() < ncpus))
      thr_setconcurrency(ncpus);

    /* thread count not to exceed THR_PER_LWP per lwp */
    threads_avail = (ncpus == 1) ? 0 : (ncpus * THR_PER_LWP);
  }
  sort_args.sa_base = a;
  sort_args.sa_nel = n;
  sort_args.sa_width = width;
  sort_args.sa_compar = compar;
  (void) _quicksort(&sort_args);
}

multithreading ?

/*
 * multiple-thread quick-sort.
 * Works fine on uniprocessor machines as well.
 */

#include <unistd.h>
#include <stdlib.h>
#include <thread.h>

/* don't create more threads for less than this */
#define SLICE_THRESH   4096

/* how many threads per lwp */
#define THR_PER_LWP       4

/* cast the void to a one byte quanitity and compute the offset */
#define SUB(a, n)      ((void *) (((unsigned char *) (a)) + ((n) * width)))

typedef struct {
  void    *sa_base;
  int      sa_nel;
  size_t   sa_width;
  int    (*sa_compar)(const void *, const void *);
} sort_args_t;

/* for all instances of quicksort */
static int threads_avail;

#define SWAP(a, i, j, width)
{ 
  int n; 
  unsigned char uc; 
  unsigned short us; 
  unsigned long ul; 
  unsigned long long ull; 

  if (SUB(a, i) == pivot) 
    pivot = SUB(a, j); 
  else if (SUB(a, j) == pivot) 
    pivot = SUB(a, i); 

  /* one of the more convoluted swaps I've done */ 
  switch(width) { 
  case 1: 
    uc = *((unsigned char *) SUB(a, i)); 
    *((unsigned char *) SUB(a, i)) = *((unsigned char *) SUB(a, j)); 
    *((unsigned char *) SUB(a, j)) = uc; 
    break; 
  case 2: 
    us = *((unsigned short *) SUB(a, i)); 
    *((unsigned short *) SUB(a, i)) = *((unsigned short *) SUB(a, j)); 
    *((unsigned short *) SUB(a, j)) = us; 
    break; 
  case 4: 
    ul = *((unsigned long *) SUB(a, i)); 
    *((unsigned long *) SUB(a, i)) = *((unsigned long *) SUB(a, j)); 
    *((unsigned long *) SUB(a, j)) = ul; 
    break; 
  case 8: 
    ull = *((unsigned long long *) SUB(a, i)); 
    *((unsigned long long *) SUB(a,i)) = *((unsigned long long *) SUB(a,j)); 
    *((unsigned long long *) SUB(a, j)) = ull; 
    break; 
  default: 
    for(n=0; n<width; n++) { 
      uc = ((unsigned char *) SUB(a, i))[n]; 
      ((unsigned char *) SUB(a, i))[n] = ((unsigned char *) SUB(a, j))[n]; 
      ((unsigned char *) SUB(a, j))[n] = uc; 
    } 
    break; 
  } 
}

static void *
_quicksort(void *arg)
{
  sort_args_t *sargs = (sort_args_t *) arg;
  register void *a = sargs->sa_base;
  int n = sargs->sa_nel;
  int width = sargs->sa_width;
  int (*compar)(const void *, const void *) = sargs->sa_compar;
  register int i;
  register int j;
  int z;
  int thread_count = 0;
  void *t;
  void *b[3];
  void *pivot = 0;
  sort_args_t sort_args[2];
  thread_t tid;

  /* find the pivot point */
  switch(n) {
  case 0:
  case 1:
    return 0;
  case 2:
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    return 0;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, 2), SUB(a, 1)) < 0) {
      SWAP(a, 2, 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, 1), SUB(a, 0)) < 0) {
      SWAP(a, 1, 0, width);
    }
    return 0;
  default:
    if (n > 3) {
      b[0] = SUB(a, 0);
      b[1] = SUB(a, n / 2);
      b[2] = SUB(a, n - 1);
      /* three sort */
      if ((*compar)(b[0], b[1]) > 0) {
        t = b[0];
        b[0] = b[1];
        b[1] = t;
      }
      /* the first two are now ordered, now order the second two */
      if ((*compar)(b[2], b[1]) < 0) {
        t = b[1];
        b[1] = b[2];
        b[2] = t;
      }
      /* should the second be moved to the first? */
      if ((*compar)(b[1], b[0]) < 0) {
        t = b[0];
        b[0] = b[1];
        b[1] = t;
      }
      if ((*compar)(b[0], b[2]) != 0)
        if ((*compar)(b[0], b[1]) < 0)
          pivot = b[1];
        else
          pivot = b[2];
    }
    break;
  }
  if (pivot == 0)
    for(i=1; i<n; i++) {
      if (z = (*compar)(SUB(a, 0), SUB(a, i))) {
        pivot = (z > 0) ? SUB(a, 0) : SUB(a, i);
        break;
      }
    }
  if (pivot == 0)
    return;

  /* sort */
  i = 0;
  j = n - 1;
  while(i <= j) {
    while((*compar)(SUB(a, i), pivot) < 0)
      ++i;
    while((*compar)(SUB(a, j), pivot) >= 0)
      --j;
    if (i < j) {
      SWAP(a, i, j, width);
      ++i;
      --j;
    }
  }

  /* sort the sides judiciously */
  switch(i) {
  case 0:
  case 1:
    break;
  case 2:
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    break;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, 0), SUB(a, 1)) > 0) {
      SWAP(a, 0, 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, 2), SUB(a, 1)) < 0) {
      SWAP(a, 2, 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, 1), SUB(a, 0)) < 0) {
      SWAP(a, 1, 0, width);
    }
    break;
  default:
    sort_args[0].sa_base          = a;
    sort_args[0].sa_nel           = i;
    sort_args[0].sa_width         = width;
    sort_args[0].sa_compar        = compar;
    if ((threads_avail > 0) && (i > SLICE_THRESH)) {
      threads_avail--;
      thr_create(0, 0, _quicksort, &sort_args[0], 0, &tid);
      thread_count = 1;
    } else
      _quicksort(&sort_args[0]);
    break;
  }
  j = n - i;
  switch(j) {
  case 1:
    break;
  case 2:
    if ((*compar)(SUB(a, i), SUB(a, i + 1)) > 0) {
      SWAP(a, i, i + 1, width);
    }
    break;
  case 3:
    /* three sort */
    if ((*compar)(SUB(a, i), SUB(a, i + 1)) > 0) {
      SWAP(a, i, i + 1, width);
    }
    /* the first two are now ordered, now order the second two */
    if ((*compar)(SUB(a, i + 2), SUB(a, i + 1)) < 0) {
      SWAP(a, i + 2, i + 1, width);
    }
    /* should the second be moved to the first? */
    if ((*compar)(SUB(a, i + 1), SUB(a, i)) < 0) {
      SWAP(a, i + 1, i, width);
    }
    break;
  default:
    sort_args[1].sa_base          = SUB(a, i);
    sort_args[1].sa_nel           = j;
    sort_args[1].sa_width         = width;
    sort_args[1].sa_compar        = compar;
    if ((thread_count == 0) && (threads_avail > 0) && (i > SLICE_THRESH)) {
      threads_avail--;
      thr_create(0, 0, _quicksort, &sort_args[1], 0, &tid);
      thread_count = 1;
    } else
      _quicksort(&sort_args[1]);
    break;
  }
  if (thread_count) {
    thr_join(tid, 0, 0);
    threads_avail++;
  }
  return 0;
}

void
quicksort(void *a, size_t n, size_t width,
          int (*compar)(const void *, const void *))
{
  static int ncpus = -1;
  sort_args_t sort_args;

  if (ncpus == -1) {
    ncpus = sysconf( _SC_NPROCESSORS_ONLN);

    /* lwp for each cpu */
    if ((ncpus > 1) && (thr_getconcurrency() < ncpus))
      thr_setconcurrency(ncpus);

    /* thread count not to exceed THR_PER_LWP per lwp */
    threads_avail = (ncpus == 1) ? 0 : (ncpus * THR_PER_LWP);
  }
  sort_args.sa_base = a;
  sort_args.sa_nel = n;
  sort_args.sa_width = width;
  sort_args.sa_compar = compar;
  (void) _quicksort(&sort_args);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文