当前位置：文江博客话题详情

什么时候应该使用基数排序？

发布于 2024-10-01 18:34:50 字数 188 浏览 10 评论 0原文

看来基数排序具有非常好的平均情况性能，即O(kN)： http://en.wikipedia.org/wiki/Radix_sort

然而，似乎大多数人仍在使用快速排序 - 这是为什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生来就爱笑 2024-10-08 18:34:50

基数排序比大多数其他排序算法更难推广。它需要固定大小的密钥，以及将密钥分解成碎片的一些标准方法。因此它永远不会进入图书馆。

回复收藏 0 原文

亣腦蒛氧 2024-10-08 18:34:50

这里的其他答案未能给出何时实际使用基数排序的示例。

一个示例是使用倾斜 DC3 算法 (Kärkkäinen-Sanders-Burkhardt) 创建“后缀数组”时。如果排序算法是线性时间的，则该算法只是线性时间的，并且基数排序在这里是必要且有用的，因为键在构造上很短（整数的三元组）。

回复收藏 0 原文

香橙ぽ 2024-10-08 18:34:50

根据您的评论进行编辑：

基数排序仅适用于整数、固定大小字符串、浮点数以及“小于”、“大于”或“字典顺序”比较谓词，而比较排序可以适应不同的顺序。
k可以大于log N。
快速排序可以就地完成，基数排序变得效率较低。

回复收藏 0 原文

带上头具痛哭 2024-10-08 18:34:50

除非您有巨大列表或极小的键，否则 log(N) 通常小于 k，但很少会高很多。因此，选择平均情况性能为 O(N log N) 的通用排序算法并不一定比使用基数排序差。

更正：正如@Mehrdad在评论中指出的那样，上面的论点并不合理：要么密钥大小是常量，那么基数排序是O(N)，要么密钥大小是k，那么快速排序的时间复杂度为 O(k N log N)。所以理论上来说，基数排序确实有更好的渐近运行时间。

实际上，运行时将由以下术语主导：

radix sort: c1 k N
quicksort: c2 k N log(N)

其中 c1 >> c2，因为从较长的密钥中“提取”位通常是一项昂贵的操作，涉及位移位和逻辑运算（或至少未对齐的内存访问），而现代 CPU 可以在一次操作中将密钥与 64、128 甚至 256 位进行比较。因此，对于许多常见情况，除非 N 很大，否则 c1 将大于 c2 log(N)

回复收藏 0 原文

π浅易 2024-10-08 18:34:50

基数排序需要 O(k*n) 时间。但你必须问什么是 K。K 是“位数”（有点简单，但基本上就是这样）。

那么，你有多少位数字？完全正确的答案，比 log(n) （以“数字大小”为基础的 log）要多，这使得 Radix 算法的复杂度为 O(n log n)。

这是为什么？如果您的数字少于 log(n)，那么您可能的数字也少于 n。因此，您可以简单地使用“计数排序”，这需要 O(n) 时间（只需计算每个数字有多少个）。所以我假设你有超过 k>log(n) 个数字......

这就是为什么人们不那么使用基数排序。尽管在某些情况下值得使用它，但在大多数情况下快速排序要好得多。

回复收藏 0 原文

蓝眸 2024-10-08 18:34:50

当n> 128，我们在对int32s进行排序时应该使用RadixSort

，我选择基数256，所以k = log(256, 2^32) = 4，它明显小于log(2, n)

并且在我的测试中，基数排序是7倍在最好的情况下比快速排序更快。

public class RadixSort {
    private static final int radix=256, shifts[]={8,16,24}, mask=radix-1;
    private final int bar[]=new int[radix];
    private int s[] = new int[65536];//不使用额外的数组t，提高cpu的cache命中率

    public void ensureSort(int len){
        if(s.length < len)
            s = new int[len];
    }   

    public void sort(int[] a){
        int n=a.length;
        ensureSort(n);
        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[a[i]&mask]++;//bar存放了桶内元素数量
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];//bar存放了桶内的各个元素在排序结果中的最大下标+1
        for(int i=0;i<n;i++)s[--bar[a[i]&mask]]=a[i];//对桶内元素，在bar中找到下标x=bar[slot]-1, 另s[x]=a[i]（同时--bar[slot]将下标前移，供桶内其它元素使用）

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(s[i]>>8)&mask]++;
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
        for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>8)&mask]]=s[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(a[i]>>16)&mask]++;
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
        for(int i=n-1;i>=0;i--)s[--bar[(a[i]>>16)&mask]]=a[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(s[i]>>24)&mask]++;
        for(int i=129;i<radix;i++)bar[i]+=bar[i-1];//bar[128~255]是负数，比正数小
        bar[0] += bar[255];
        for(int i=1;i<128;i++)bar[i]+=bar[i-1];     
        for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>24)&mask]]=s[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变      
    }
}

when n > 128, we should use RadixSort

when sort int32s, I choose radix 256, so k = log(256, 2^32) = 4, which is significant smaller than log(2, n)

and in my test, radix sort is 7 times faster than quicksort in the best case.

public class RadixSort {
    private static final int radix=256, shifts[]={8,16,24}, mask=radix-1;
    private final int bar[]=new int[radix];
    private int s[] = new int[65536];//不使用额外的数组t，提高cpu的cache命中率

    public void ensureSort(int len){
        if(s.length < len)
            s = new int[len];
    }   

    public void sort(int[] a){
        int n=a.length;
        ensureSort(n);
        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[a[i]&mask]++;//bar存放了桶内元素数量
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];//bar存放了桶内的各个元素在排序结果中的最大下标+1
        for(int i=0;i<n;i++)s[--bar[a[i]&mask]]=a[i];//对桶内元素，在bar中找到下标x=bar[slot]-1, 另s[x]=a[i]（同时--bar[slot]将下标前移，供桶内其它元素使用）

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(s[i]>>8)&mask]++;
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
        for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>8)&mask]]=s[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(a[i]>>16)&mask]++;
        for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
        for(int i=n-1;i>=0;i--)s[--bar[(a[i]>>16)&mask]]=a[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变

        for(int i=0;i<radix;i++)bar[i]=0;
        for(int i=0;i<n;i++)bar[(s[i]>>24)&mask]++;
        for(int i=129;i<radix;i++)bar[i]+=bar[i-1];//bar[128~255]是负数，比正数小
        bar[0] += bar[255];
        for(int i=1;i<128;i++)bar[i]+=bar[i-1];     
        for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>24)&mask]]=s[i];//同一个桶内的元素，低位已排序，而放入t中时是从t的大下标向小下标放入的，所以应该逆序遍历s[i]来保证原有的顺序不变      
    }
}

回复收藏 0 原文

白色秋天 2024-10-08 18:34:50

基数排序不是基于比较的排序，只能对整数（包括指针地址）和浮点等数字类型进行排序，并且可移植地支持浮点有点困难。

可能是因为它的适用范围如此狭窄，所以许多标准库选择忽略它。它甚至不能让您提供自己的比较器，因为有些人可能不想直接对整数进行排序，而希望使用整数作为其他东西的索引来用作排序的键，例如基于比较的排序允许所有这种灵活性，因此可能只是更喜欢适合 99% 人们日常需求的通用解决方案，而不是特意迎合那 1% 的人。

也就是说，尽管适用范围很窄，但在我的领域中，我发现基数排序比内向排序或快速排序更有用。我属于那 1%，几乎不使用字符串键，但经常找到受益于排序的数字用例。这是因为我的代码库围绕实体和组件（实体组件系统）的索引以及索引网格之类的内容，并且有大量的数字数据。

因此，在我的例子中，基数排序对于各种事情都变得有用。在我的例子中，一个常见的例子是消除重复索引。在这种情况下，我实际上并不需要对结果进行排序，但基数排序通常可以比其他方法更快地消除重复项。

另一种方法是寻找 kd 树沿给定维度的中值分割。对给定维度的点的浮点值进行基数排序可以在线性时间内快速给出中间位置来分割树节点。

另一个方法是按 z 对更高级别的基元进行深度排序，以获得半正确的 alpha 透明度（如果我们不打算在碎片着色器中进行此操作）。这也适用于 GUI 和矢量图形软件的 z 顺序元素。

另一种是使用索引列表的缓存友好顺序访问。如果多次遍历索引，如果我提前对它们进行基数排序，以便按顺序而不是随机顺序进行遍历，通常会提高性能。后者可以在内存中来回曲折，从缓存行中逐出数据，只是为了在同一循环中重复重新加载同一内存区域。当我在重复访问索引之前首先对索引进行基数排序时，这种情况就不会发生，并且我可以大大减少缓存未命中。这实际上是我对基数排序最常见的用途，并且当系统想要访问具有两个或更多组件的实体时，它是我的 ECS 缓存友好的关键。

就我而言，我有一个经常使用的多线程基数排序。一些基准测试：

--------------------------------------------
- test_mt_sort
--------------------------------------------
Sorting 1,000,000 elements 32 times...

mt_radix_sort: {0.234000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

std::sort: {1.778000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

qsort: {2.730000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

在我的小型硬件上，我可以平均花费 6-7 毫秒一次对一百万个数字进行排序，但这并不像我想要的那么快，因为用户有时在交互式环境中仍然可以注意到 6-7 毫秒，但是仍然比 55-85 毫秒好很多，就像 C++ 的 std::sort 或 C 的 qsort 的情况一样，这肯定会导致帧速率非常明显的打嗝。我什至听说有人使用 SIMD 实现基数排序，但我不知道他们是如何做到这一点的。我不够聪明，无法想出这样的解决方案，尽管与标准库相比，即使是我天真的小基数排序也做得很好。

Radix sort isn't a comparison-based sort and can only sort numeric types like integers (including pointer addresses) and floating-point, and it's a bit difficult to portably support floating-point.

It's probably because it has such a narrow range of applicability that many standard libraries choose to omit it. It can't even let you provide your own comparator, since some people might not want to even sort integers directly so much as using the integers as indices to something else to be used as a key for sorting, e.g. Comparison-based sorts allow all that flexibility so it's probably a case of just preferring a generalized solution fitting 99% of people's daily needs instead of going out of the way to cater to that 1%.

That said, in spite of the narrow applicability, in my domain I find more use for radix sorts than introsorts or quicksorts. I'm in that 1% and barely ever work with, say, string keys, but often find use cases for numbers that benefit from being sorted. It's because my codebase revolves around indices to entities and components (entity-component system) as well as things like indexed meshes and there's a whole lot of numeric data.

As a result, radix sort becomes useful for all kinds of things in my case. One common example in my case is eliminating duplicate indices. In that case I don't really need the results to be sorted but often a radix sort can eliminate duplicates faster than the alternatives.

Another is finding, say, a median split for a kd-tree along a given dimension. There radix sorting the floating-point values of the point for a given dimension gives me a median position rapidly in linear time to split the tree node.

Another is depth-sorting higher-level primitives by z for semi-proper alpha transparency if we aren't going to be doing it in a frag shader. That also applies to GUIs and vector graphics software to z-order elements.

Another is cache-friendly sequential access using a list of indices. If the indices are traversed many times, it often improves performance if I radix sort them in advance so that the traversal is done in sequential order instead of random order. The latter could zig-zag back and forth in memory, evicting data from cache lines only to reload the same memory region repeatedly within the same loop. When I radix sort the indices first prior to accessing them repeatedly, that ceases to happen and I can reduce cache misses considerably. This is actually my most common use for radix sorts and it's the key to my ECS being cache-friendly when systems want to access entities with two or more components.

In my case I have a multithreaded radix sort which I use quite often. Some benchmarks:

--------------------------------------------
- test_mt_sort
--------------------------------------------
Sorting 1,000,000 elements 32 times...

mt_radix_sort: {0.234000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

std::sort: {1.778000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

qsort: {2.730000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

I can average something like 6-7 ms to sort a million numbers one time on my dinky hardware which isn't as fast as I would like since 6-7 milliseconds can still be noticed by users sometimes in interactive contexts, but still a whole lot better than 55-85 ms as with the case of C++'s std::sort or C's qsort which would definitely lead to very obvious hiccups in frame rates. I've even heard of people implementing radix sorts using SIMD, though I have no idea how they managed that. I'm not smart enough to come up with such a solution, though even my naive little radix sort does quite well compared to the standard libraries.

回复收藏 0 原文

一指流沙 2024-10-08 18:34:50

k =“要排序的数组中最长值的长度”

n =“数组的长度”

O(k*n) =“最坏情况运行”

k * n = n^2 （如果 k = n）

所以当使用基数排序确保“最长的整数短于数组大小”，反之亦然。那么你就会打败快速排序！

缺点是：大多数时候您无法确定整数会变成多大，但如果您有固定的数字范围，则应该采用基数排序。

回复收藏 0 原文

ま昔日黯然 2024-10-08 18:34:50

这是比较快速排序和基数排序的链接：

Is对于整数数组，基数排序比快速排序快？（是的，是 2-3 倍）

这里是另一个分析几种算法运行时间的链接：

排序问题：

对于相同的数据，哪个更快； O(n) 排序还是 O(nLog(n)) 排序？

答：这要看情况。这取决于要排序的数据量。它取决于运行它的硬件，并且取决于算法的实现。

回复收藏 0 原文

野却迷人 2024-10-08 18:34:50

一个例子是当您对一组非常大的整数或整数数组进行排序时。基数排序和任何其他类型的分布排序都非常快，因为数据元素主要被排队到队列数组中（LSD 基数排序最多 10 个队列）并重新映射到要排序的相同输入数据的不同索引位置。没有嵌套循环，因此当要排序的数据输入整数的数量变得明显更大时，算法往往表现得更加线性。与其他排序方法不同，比如效率极低的 bubbleSort 方法，基数排序没有实现比较操作来排序。它只是将整数重新映射到不同索引位置直到输入最终排序的简单过程。如果你想自己测试 LSD 基数排序，我已经写了一个并存储在 github 上，可以在在线 js ide（例如 eloquent javascript 的编码沙箱）上轻松测试。请随意使用它并观察它在不同 n 数下的表现。我已经测试了多达 900,000 个未排序的整数，运行时间 << 300毫秒。如果您想尝试一下，请点击以下链接。

https://gist.github.com/StBean/4af58d09021899f14dfa585df6c86df6

回复收藏 0 原文