什么时候适合使用基数排序?
能够使用基数排序的数据有哪些限制?
如果我要对一个大的整数列表进行排序,使用基数排序是否合适?为什么基数排序不被更多地使用?
What are the constraints on your data for you to be able to use Radix sort?
If I'm sorting a large list of integers, would it be appropriate to use Radix sort? Why is Radix sort not used more?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当您拥有大量数据且其键受到某种限制时,这非常有用。例如,当您需要对 100 万个 64 位数字的数组进行排序时,可以使用它按 8 个最低有效位排序,然后按接下来的 8 位排序,依此类推(应用 8 次)。这样这个数组就可以通过 8*1M 的操作来排序,而不是 1M*log(1M)。
It's great when you have a large set of data with keys that are somehow constrained. For example, when you need to order a 1-million array of 64-bit numbers, it can be used to sort by 8 least significant bits, then by the next 8, and so on (applied 8 times). That way this array can be sorted in 8*1M operations, rather than 1M*log(1M).
如果您知道整数值的范围,并且它不是太大,
对于您的情况,也许 计数排序 会是更好的选择。
If you know the range of the integer values, and it's not too large,
maybe counting sort would be a better choice in your case.
您可能不会像您想象的那样经常看到它的原因之一是基数排序不像基于比较的排序(快速排序/合并排序/堆排序)那么通用。它要求您可以将要排序的项目表示为整数或类似整数的东西。使用标准库时,很容易定义比较任意对象的比较函数。定义一种将任意数据类型正确映射为整数的编码可能会更困难。
One reason you might not see it as often as you'd think you would is that Radix sort is not as general purpose as comparison based sorts (quicksort/mergesort/heapsort). It requires that you can represent the items to be sorted as an integer, or something like an integer. When using a standard library, it is easy to define a comparison function that compares arbitrary objects. It might be harder to define an encoding that properly maps your arbitrary data type into an integer.
桶排序在离散键值的数量相对于数据项的数量较小的情况下非常有用,并且目标是在不干扰原始列表的情况下生成列表的重新排序副本(因此需要维护旧的列表)与新版本的列表同步并不是一个负担)。如果可能的键的数量太大而无法在一次传递中处理,则可以通过多次传递将桶排序扩展为基数排序,但会失去桶排序为小键提供的速度优势。
在一些外部排序场景中,特别是当不同键值的数量非常少(例如两个),需要稳定的排序,并且I/O设备只能有效地处理一个顺序数据流时,可能会很有用让K穿过源数据流,其中K是键值的数量。在第一遍中,复制键是最小合法值的所有项目并跳过其余部分,然后复制键是下一个更高值的所有项目,跳过其余部分,等等。这种方法显然非常高效如果有很多不同的键值,但如果有两个就很好了。
Bucket sorting is useful in situations where the number of discrete key values is small relative to the number of data items, and where the goal is to produce a re-sorted copy of a list without disturbing the original (so needing to maintain both the old and new versions of the list simultaneously is not a burden). If the number of possible keys is too large to handle in a single pass, one can extend bucket sort into radix sort by making multiple passes, but one loses much of the speed advantage that bucket sort could offer for small keys.
In some external-sorting scenarios, especially when the number of different key values is very small (e.g. two), a stable sort is required, and the I/O device can only operate efficiently with one sequential data stream, it may be useful to make K passes through the source data stream, where K is the number of key values. On the first pass, one copies all the items where the key is the minimum legitimate value and skips the rest, then copies all the items where the key is the next higher value, skipping the rest, etc. This approach will obviously be horribly efficient if there are very many different key values, but will be quite good if there are two.