为什么快速排序比基数排序更流行?
为什么快速排序(或介绍排序)或任何基于比较的排序算法比基数排序更常见?特别是对于数字排序。
基数排序不是基于比较的,因此可能比 O(nlogn) 更快。事实上,它是 O(kn),其中 k 是用于表示每个项目的位数。并且内存开销并不重要,因为您可以选择要使用的桶的数量,并且所需的内存可能小于合并排序的要求。
和缓存有关系吗?或者也许访问数组中整数的随机字节?
Why quicksort(or introsort), or any comparison-based sorting algorithm is more common than radix-sort? Especially for sorting numbers.
Radix-sort is not comparison based, hence may be faster than O(nlogn). In fact, it is O(kn), where k is the number of bits used to represent each item. And the memory overhead is not critical, since you may choose the number of buckets to use, and required memory may be less than mergesort's requirements.
Does it have to do with caching? Or maybe accessing random bytes of integers in the array?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
基数排序的效率 = O(cn)
其中 c = 输入键集中的最大位数。
n = 输入键集中的键数。
快速排序的最佳情况 = O(n.log n)
其中 n = 输入键集中的键数。
假设需要对 16 个数字进行排序,每个数字有 6 位:
基数排序 = 16 * 6 = 96 个时间单位。
快速排序 = 16 * 4 = 64 个时间单位。
课:
当“c”较小时,Radix 确实获胜。当它高时,它就会失败。快速排序与键中的位数无关,这使得它更好并且更容易被接受
Radix sort's efficiency = O(c.n)
where c = highest number of digits among the input key set.
n = number of keys in input key set.
Quick sort's best case = O(n. log n)
where n = number of keys in input key set.
Assume 16 numbers to be sorted with 6 digits each:
Radix sort = 16 * 6 = 96 time units.
Quick sort = 16 * 4 = 64 time units.
Lesson:
When 'c' is less, Radix does win. When it's high, it loses. Quick sort is independent of number of digits in a key and that makes it somewhat better and more practically acceptable
一个明显的答案是,您可以使用快速排序(即任何可比较的东西)对任意类型进行排序,而您仅限于使用基数的数字。 IMO 快速排序更加直观。
One obvious answer is that you can sort arbitrary types using quicksort (ie anything that's comparable), while you are restricted to numbers only with radix. And IMO quicksort is a lot more intuitive.
正如维基百科中提到的
As mentioned on Wikipedia
其他答案中的观点是有效的,但就您在几条评论中提到的问题而言
快速排序是“安全”的选择。
是的,基于计数排序的基数排序的潜在运行时间非常有吸引力,但基数排序很容易在恶意/不幸的数据集上表现不佳。如果要排序的键的位数接近要排序的键的数量,则基数排序会在 n^2 上执行,并且空间复杂度不可忽略,并且除了数字之外,它往往具有相当高的内置运行时常量正在排序的键的位数。
合并排序很有吸引力,因为它的行为在某些方面类似于快速排序,在每个机会(中位数)上选择最佳枢轴。然而,它具有相当大的空间复杂度。它不像基数那样容易受到恶意/不幸数据的影响,但也无法提供有吸引力的运行时间。
除了几乎(或完全)排序的数据集之外,基本的快速排序在大多数数据集上都表现良好,并且空间复杂度很小。
通过将快速排序转换为随机快速排序,可以轻松解决快速排序的漏洞。基数排序的漏洞是通过对正在排序的键施加限制来解决的,这本质上会限制库的用户。在小数据集上,快速排序比合并性能更高,并且当合并可能更快时,它的性能也相当不错。
实现库时,您希望使其具有普遍用途。以这些示例为例,一个 Web 应用程序和一个具有极其受限的微控制器的小型设备。
Web应用程序需要定期处理恶意数据,并且也有各种各样的需求。具有预处理限制的库不太可能有用。就微控制器而言,它可能受到空间的限制,无法放弃哪怕一点点可以节省的空间。快速排序可以节省空间,并且如果出现速度较慢的情况,则完成速度只会慢一个常数乘数。
总而言之 -
1.) 库的编码通常是为了尽可能多的通用可用性
2.) 良好的性能是可以接受的,特别是在很多情况下,最好的性能
3.) 空间并不总是一个主要问题,但当它是时,它通常是明确限制性的,因此
Points made in other answers are valid, but as far as the concern of yours mentioned in several comments
Quicksort is the 'safe' choice.
The potential runtime of a radix sort based on a counting sort is very attractive, yes, but radix sort is subsceptible to performing poorly on malicious/unfortunate datasets. If the number of digits of the keys being sorted approaches the number of keys being sorted, radix sort performs on n^2 along with a non-negligible space complexity, and it tends to have fairly high builtin runtime constants other than that of the number of digits of the keys being sorted.
Mergesort is attractive because its behavior is, in some ways, analagous to a quicksort that picks an optimal pivot at each opportunity (the median). However, it comes with an appreciable space complexity. It is not as subsceptible to malicious/unfortunate data as radix, but also does not offer the attractive possible runtime.
A basic quicksort performs very well on most datasets except nearly (or completely) sorted ones, and comes with a tiny space complexity.
Quicksort's vulnerability is easily dealt with by converting it to a randomized quicksort. Radix sort's vulnerability is resolved by placing restrictions on the keys being sorted, which would inherently limit the library's users. Quicksort is more performant than merge on small datasets, and performs reasonably when merge might be faster.
When implementing a library, you want to make it generically useful. Take these examples, a web application and a small device with an extremely restricted microcontroller.
Web applications need to deal with malicious data on a regular basis, and also have a wide variety of needs. A library with preconditioned restrictions is less likely to be useful. In the case of the microcontroller, it may be restrictively limited on space and unable to relinquish the slightest bit where one can be saved. Quicksort saves space, and will complete only slower by a constant multiplier IF a situation arises that it is slower.
In sum -
1.) Libraries are often coded for as much generic usability as possible
2.) Good performance all around is acceptable, especially if it is in many cases, the best performance
3.) Space is not always a primary issue, but when it is, it is often explicitly restrictively so
我想到了两个论点:
Quicksort/Introsort 更灵活:
Quicksort 和 Introsort 可以很好地处理各种数据。排序所需的只是比较项目的可能性。这对于数字来说很简单,但您也可以对其他数据进行排序。
另一方面,基数排序只是按二进制表示形式对事物进行排序。它从不将项目相互比较。
基数排序需要更多内存。
我见过的所有基数排序实现都使用辅助缓冲区来存储部分排序结果。这增加了排序算法的内存需求。如果您只对几千字节进行排序,这可能不是问题,但如果您进入千兆字节范围,则会产生巨大的差异。
如果我没记错的话,纸上存在一个就地基数排序算法。
Two arguments come to my mind:
Quicksort/Introsort is more flexible:
Quicksort and Introsort work well with all kinds of data. All you need for sorting is the possibility to compare items. This is trivial with numbers but you can sort other data as well.
Radix sort on the other hand just sorts things by their binary representation. It never compares items against each other.
Radix sort needs more memory.
All radix sort implementations that I've seen use a secondary buffer to store partial sorting results. This increases the memory requirements of the sorting algorithm. That may not be a problem if you only sort a couple of kilobytes, but if you go into the gigabyte range it makes a huge difference.
If I remember right a in place radix-sort algorithm exist on paper though.
对于(大多数)现实世界用例来说,基数排序速度较慢。
原因之一是算法的复杂性:
如果项目是唯一的,则 k >= log(n)。即使有重复的项目,k < 的问题集也是如此。 log(n) 很小。
另一个是实现:
额外的内存需求(这本身就是一个缺点)会对缓存性能产生负面影响。
我认为可以肯定地说,许多库(例如标准库)都使用快速排序,因为它在大多数情况下表现更好。
我不认为“实施困难”或“不太直观”是主要因素。
Radix sort is slower for (most) real world use cases.
One reason is the complexity of the algorithm:
If items are unique, k >= log(n). Even with duplicate items, the set of problems where k < log(n) is small.
Another is the implementation:
The additional memory requirement (which in it self is a disadvantage), affects cache performance negatively.
I think it is safe to say that many libraries, like the standard library, use Quicksort because it performs better in the majority of cases.
I don't think that "difficult implementation" or "less intuitive" are major factors.