为什么快速排序比基数排序更流行?

发布于 2024-09-15 06:31:29 字数 211 浏览 10 评论 0原文

为什么快速排序(或介绍排序)或任何基于比较的排序算法比基数排序更常见?特别是对于数字排序。

基数排序不是基于比较的,因此可能比 O(nlogn) 更快。事实上,它是 O(kn),其中 k 是用于表示每个项目的位数。并且内存开销并不重要,因为您可以选择要使用的桶的数量,并且所需的内存可能小于合并排序的要求。

和缓存有关系吗?或者也许访问数组中整数的随机字节?

Why quicksort(or introsort), or any comparison-based sorting algorithm is more common than radix-sort? Especially for sorting numbers.

Radix-sort is not comparison based, hence may be faster than O(nlogn). In fact, it is O(kn), where k is the number of bits used to represent each item. And the memory overhead is not critical, since you may choose the number of buckets to use, and required memory may be less than mergesort's requirements.

Does it have to do with caching? Or maybe accessing random bytes of integers in the array?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

眉黛浅 2024-09-22 06:31:33

基数排序的效率 = O(cn)
其中 c = 输入键集中的最大位数。
n = 输入键集中的键数。

快速排序的最佳情况 = O(n.log n)
其中 n = 输入键集中的键数。

假设需要对 16 个数字进行排序,每个数字有 6 位:

基数排序 = 16 * 6 = 96 个时间单位。
快速排序 = 16 * 4 = 64 个时间单位。

课:
当“c”较小时,Radix 确实获胜。当它高时,它就会失败。快速排序与键中的位数无关,这使得它更好并且更容易被接受

Radix sort's efficiency = O(c.n)
where c = highest number of digits among the input key set.
n = number of keys in input key set.

Quick sort's best case = O(n. log n)
where n = number of keys in input key set.

Assume 16 numbers to be sorted with 6 digits each:

Radix sort = 16 * 6 = 96 time units.
Quick sort = 16 * 4 = 64 time units.

Lesson:
When 'c' is less, Radix does win. When it's high, it loses. Quick sort is independent of number of digits in a key and that makes it somewhat better and more practically acceptable

中性美 2024-09-22 06:31:32

一个明显的答案是,您可以使用快速排序(即任何可比较的东西)对任意类型进行排序,而您仅限于使用基数的数字。 IMO 快速排序更加直观。

One obvious answer is that you can sort arbitrary types using quicksort (ie anything that's comparable), while you are restricted to numbers only with radix. And IMO quicksort is a lot more intuitive.

七分※倦醒 2024-09-22 06:31:32

正如维基百科中提到的

与其他排序算法相比,基数排序的效率这个话题有些棘手,并且容易受到很多误解。与基于比较的最佳算法相比,基数排序是否同样有效、效率更低还是更高,取决于所做假设的细节。对于具有 d 或更少数字的 n 个键,基数排序效率为 O(d·n)。有时 d 表示为常数,这将使基数排序(对于足够大的 n)比最好的基于比较的排序算法更好,这些算法都需要 O(n·log(n)) 次比较。然而,一般来说 d 不能被视为常数。 特别是,在所有键都是不同的共同(但有时是隐含的)假设下,则 d 必须至少为 log(n) 的量级,这最多给出(使用密集封装的键)时间复杂度 O (n·log(n))。这似乎使得基数排序最多与最好的基于比较的排序一样有效(如果键比 log(n) 长得多,情况会更糟)。

反驳的论点是基于比较的算法是以比较次数来衡量的,而不是实际的时间复杂度。在某些假设下,比较平均时间为恒定时间,而在其他假设下则不然。随机生成的密钥的比较平均需要恒定的时间,因为在一半的情况下密钥的第一位不同,而在剩下的一半的一半中第二位不同,依此类推,导致两位的平均值需要进行比较。在排序算法中,第一次比较满足随机性条件,但随着排序的进行,比较的键显然不再是随机选择的。例如,考虑自下而上的合并排序。第一遍将比较随机键对,但最后一遍将比较排序顺序非常接近的键。

决定因素是密钥的分配方式。基数排序的最佳情况是它们被视为连续的位模式。这将使密钥尽可能短,但仍然假设它们是不同的。这使得基数排序为 O(n·log(n)),但是基于比较的排序不会那么有效,因为在此假设下比较不会是恒定时间。相反,如果我们假设密钥是长度为 k·log(n) 的位模式,且常数 k > ,则密钥的长度为 k·log(n)。 1 和基数 2 对数,并且它们是均匀随机的,那么基数排序仍将是 O(n·log(n)),但基于比较的排序也会如此,因为“额外”长度使得键甚至是连续的排序结果的差异足够大,平均比较时间恒定。 如果键长于 O(log(n)),但是是随机的,那么基数排序将较差。还可以做出许多其他假设,并且大多数假设都需要仔细研究才能做出正确的比较。

As mentioned on Wikipedia

The topic of the efficiency of radix sort compared to other sorting algorithms is somewhat tricky and subject to quite a lot of misunderstandings. Whether radix sort is equally efficient, less efficient or more efficient than the best comparison-based algorithms depends on the details of the assumptions made. Radix sort efficiency is O(d·n) for n keys which have d or fewer digits. Sometimes d is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which are all O(n·log(n)) number of comparisons needed. However, in general d cannot be considered a constant. In particular, under the common (but sometimes implicit) assumption that all keys are distinct, then d must be at least of the order of log(n), which gives at best (with densely packed keys) a time complexity O(n·log(n)). That would seem to make radix sort at most equally efficient as the best comparison-based sorts (and worse if keys are much longer than log(n)).

The counter argument is the comparison-based algorithms are measured in number of comparisons, not actual time complexity. Under some assumptions the comparisons will be constant time on average, under others they will not. Comparisons of randomly-generated keys takes constant time on average, as keys differ on the very first bit in half the cases, and differ on the second bit in half of the remaining half, and so on, resulting in an average of two bits that need to be compared. In a sorting algorithm the first comparisons made satisfies the randomness condition, but as the sort progresses the keys compared are clearly not randomly chosen anymore. For example, consider a bottom-up merge sort. The first pass will compare pairs of random keys, but the last pass will compare keys that are very close in the sorting order.

The deciding factor is how the keys are distributed. The best case for radix sort is that they are taken as consecutive bit patterns. This will make the keys as short as they can be, still assuming they are distinct. This makes radix sort O(n·log(n)), but the comparison based sorts will not be as efficient, as the comparisons will not be constant time under this assumption. If we instead assume that the keys are bit patterns of length k·log(n) for a constant k > 1 and base 2 log, and that they are uniformly random, then radix sort will still be O(n·log(n)), but so will the comparison based sorts, as the "extra" length makes even the keys that are consecutive in the sorted result differ enough that comparisons are constant time on average. If keys are longer than O(log(n)), but random, then radix sort will be inferior. There are many other assumptions that can be made as well, and most require careful study to make a correct comparison.

没有你我更好 2024-09-22 06:31:32

其他答案中的观点是有效的,但就您在几条评论中提到的问题而言

...数字的默认排序算法是使用快速排序实现的。尤其是库中的实现...

快速排序是“安全”的选择。
是的,基于计数排序的基数排序的潜在运行时间非常有吸引力,但基数排序很容易在恶意/不幸的数据集上表现不佳。如果要排序的键的位数接近要排序的键的数量,则基数排序会在 n^2 上执行,并且空间复杂度不可忽略,并且除了数字之外,它往往具有相当高的内置运行时常量正在排序的键的位数。
合并排序很有吸引力,因为它的行为在某些方面类似于快速排序,在每个机会(中位数)上选择最佳枢轴。然而,它具有相当大的空间复杂度。它不像基数那样容易受到恶意/不幸数据的影响,但也无法提供有吸引力的运行时间。
除了几乎(或完全)排序的数据集之外,基本的快速排序在大多数数据集上都表现良好,并且空间复杂度很小。
通过将快速排序转换为随机快速排序,可以轻松解决快速排序的漏洞。基数排序的漏洞是通过对正在排序的键施加限制来解决的,这本质上会限制库的用户。在小数据集上,快速排序比合并性能更高,并且当合并可能更快时,它的性能也相当不错。
实现库时,您希望使其具有普遍用途。以这些示例为例,一个 Web 应用程序和一个具有极其受限的微控制器的小型设备。
Web应用程序需要定期处理恶意数据,并且也有各种各样的需求。具有预处理限制的库不太可能有用。就微控制器而言,它可能受到空间的限制,无法放弃哪怕一点点可以节省的空间。快速排序可以节省空间,并且如果出现速度较慢的情况,则完成速度只会慢一个常数乘数。
总而言之 -
1.) 库的编码通常是为了尽可能多的通用可用性
2.) 良好的性能是可以接受的,特别是在很多情况下,最好的性能
3.) 空间并不总是一个主要问题,但当它是时,它通常是明确限制性的,因此

Points made in other answers are valid, but as far as the concern of yours mentioned in several comments

...the fact that the default sorting algorithms for numbers are implemented using quicksort. Especially the implementations in libraries...

Quicksort is the 'safe' choice.
The potential runtime of a radix sort based on a counting sort is very attractive, yes, but radix sort is subsceptible to performing poorly on malicious/unfortunate datasets. If the number of digits of the keys being sorted approaches the number of keys being sorted, radix sort performs on n^2 along with a non-negligible space complexity, and it tends to have fairly high builtin runtime constants other than that of the number of digits of the keys being sorted.
Mergesort is attractive because its behavior is, in some ways, analagous to a quicksort that picks an optimal pivot at each opportunity (the median). However, it comes with an appreciable space complexity. It is not as subsceptible to malicious/unfortunate data as radix, but also does not offer the attractive possible runtime.
A basic quicksort performs very well on most datasets except nearly (or completely) sorted ones, and comes with a tiny space complexity.
Quicksort's vulnerability is easily dealt with by converting it to a randomized quicksort. Radix sort's vulnerability is resolved by placing restrictions on the keys being sorted, which would inherently limit the library's users. Quicksort is more performant than merge on small datasets, and performs reasonably when merge might be faster.
When implementing a library, you want to make it generically useful. Take these examples, a web application and a small device with an extremely restricted microcontroller.
Web applications need to deal with malicious data on a regular basis, and also have a wide variety of needs. A library with preconditioned restrictions is less likely to be useful. In the case of the microcontroller, it may be restrictively limited on space and unable to relinquish the slightest bit where one can be saved. Quicksort saves space, and will complete only slower by a constant multiplier IF a situation arises that it is slower.
In sum -
1.) Libraries are often coded for as much generic usability as possible
2.) Good performance all around is acceptable, especially if it is in many cases, the best performance
3.) Space is not always a primary issue, but when it is, it is often explicitly restrictively so

無心 2024-09-22 06:31:31

我想到了两个论点:

  1. Quicksort/Introsort 更灵活:

    Quicksort 和 Introsort 可以很好地处理各种数据。排序所需的只是比较项目的可能性。这对于数字来说很简单,但您也可以对其他数据进行排序。

    另一方面,基数排序只是按二进制表示形式对事物进行排序。它从不将项目相互比较。

  2. 基数排序需要更多内存。

    我见过的所有基数排序实现都使用辅助缓冲区来存储部分排序结果。这增加了排序算法的内存需求。如果您只对几千字节进行排序,这可能不是问题,但如果您进入千兆字节范围,则会产生巨大的差异。

    如果我没记错的话,纸上存在一个就地基数排序算法。

Two arguments come to my mind:

  1. Quicksort/Introsort is more flexible:

    Quicksort and Introsort work well with all kinds of data. All you need for sorting is the possibility to compare items. This is trivial with numbers but you can sort other data as well.

    Radix sort on the other hand just sorts things by their binary representation. It never compares items against each other.

  2. Radix sort needs more memory.

    All radix sort implementations that I've seen use a secondary buffer to store partial sorting results. This increases the memory requirements of the sorting algorithm. That may not be a problem if you only sort a couple of kilobytes, but if you go into the gigabyte range it makes a huge difference.

    If I remember right a in place radix-sort algorithm exist on paper though.

岁月打碎记忆 2024-09-22 06:31:31

对于(大多数)现实世界用例来说,基数排序速度较慢。

原因之一是算法的复杂性:

如果项目是唯一的,则 k >= log(n)。即使有重复的项目,k < 的问题集也是如此。 log(n) 很小。

另一个是实现:

额外的内存需求(这本身就是一个缺点)会对缓存性能产生负面影响。

我认为可以肯定地说,许多库(例如标准库)都使用快速排序,因为它在大多数情况下表现更好。
我不认为“实施困难”或“不太直观”是主要因素。

Radix sort is slower for (most) real world use cases.

One reason is the complexity of the algorithm:

If items are unique, k >= log(n). Even with duplicate items, the set of problems where k < log(n) is small.

Another is the implementation:

The additional memory requirement (which in it self is a disadvantage), affects cache performance negatively.

I think it is safe to say that many libraries, like the standard library, use Quicksort because it performs better in the majority of cases.
I don't think that "difficult implementation" or "less intuitive" are major factors.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文