Java 7 是否使用 Tim Sort 作为方法 Arrays.Sort?
我找不到Java 7的文档,我只能找到关于Java 6的信息,它仍然是快速或合并的。有谁知道如何找到Java 7中Arrays.sort
方法的文档吗?
I can't find the documentation of Java 7, I can only find about the Java 6, which is still quick or merge. Does anyone know how to find the documentation of the method Arrays.sort
in Java 7?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Java 7 到 Java 22 使用双轴快速排序 (Dual-Pivot Quicksort) 来处理基元,使用 TimSort 来处理对象。
根据 原语的 Java 7 API 文档:
根据 对象的 Java 7 API 文档:
Timsort 是一种混合“合并排序和插入排序” ”。
同上 在 Java 22 中。
不确定这是否与 Java 6 中的情况有很大不同,对于 Arrays.sort JDK6:
对于 Object[] 或集合 (Collections.sort()),使用合并排序。
Java 7 through Java 22 uses Dual-Pivot Quicksort for primitives and TimSort for objects.
According to the Java 7 API doc for primitives:
According to the Java 7 API doc for objects:
Timsort is a hybrid "merge sort and insertion sort."
Ditto in Java 22.
Not sure if this is much different from what it was in Java 6, for Arrays.sort JDK6:
For Object[] or collections (Collections.sort()) merge sort is used.
是的! ... 也没有。
总结
在撰写本文时(2016 年),在当前的 Open JDK 0 实现中,Tim Sort 通常用于对对象数组进行排序(即
数组。 sort(Object[])
和朋友) - 但对于原始数组(Arrays.sort
方法的其余部分)还有各种其他方法被使用。对于基元,启发式方法会在各种排序方法中进行选择,例如快速排序、合并排序、计数排序3。取决于正在排序的数据。大多数这些决策都是根据要排序的数组的类型和大小预先做出的,但对于
int
和long
元素,该决策实际上是根据测量数组的有序性。因此,在许多情况下,在适应/内省(TimSort 或类似的合并排序)之上,您还需要适应/内省(选择算法的启发式方法)!详细信息
Tim Sort 用于大多数类型的对象,例如 Arrays.sort(Object[] a) ,除非用户通过设置系统属性 java.util 专门请求旧行为。 Arrays.useLegacyMergeSort 为 true。
对于基元来说,情况更为复杂。至少从 JDK 8(版本
1.8.0_111
)开始,根据要排序的数组的大小、原始类型和测量的数组“排序度”,使用各种启发式方法。概述如下:DualPivotQuicksort.INSERTION_SORT_THRESHOLD
)。在对使用合并或快速排序时出现的子数组进行排序并且子数组的大小低于阈值时,也会使用此阈值。因此,某种形式的插入排序将用于所有排序,并且对于小型数组,它是唯一使用的算法。byte
、short
和char
,计数排序用于较大的数组。这是一种简单的排序,需要O(n + range)
时间,其中range
是字节 (256) 或短/字符 (65536) 值的总数。排序涉及分配范围
值的底层数组,因此仅当要排序的元素数量占总范围的很大一部分时才使用它。特别是,它用于大于 29 个元素的字节数组(即范围的 11%)和大于 3200 个元素的短/字符数组(范围的 5%)。int
和long
数组以及高于插入排序阈值的short
/char
数组并且低于计数排序阈值,可以使用两种算法之一:双主元快速排序或合并排序。使用哪一个取决于数组排序的度量:输入被分为升序或降序元素的运行。如果此类运行的次数大于 66,则该数组被认为大部分未排序,并使用双枢轴快速排序进行排序。否则,数组被认为大部分已排序,并使用归并排序(使用已枚举的运行作为起点)。查找运行然后使用归并排序对它们进行排序的想法实际上与 TimSort 非常相似,尽管存在一些差异。因此,至少对于某些参数,JDK 使用运行感知归并排序,但对于许多其他参数组合,它使用不同的算法,并且总共至少使用 5 种不同的算法!
基本原理
Object[]
与基元的不同排序行为背后的原因可能至少有两个方面:1)
Object[]
的排序必须是 stable:同等排序的对象将以与输入相同的顺序出现。对于原始数组,不存在这样的概念:原始数组完全由它们的值定义,因此稳定排序和不稳定排序之间没有区别。这使得原始排序不需要稳定的算法来提高速度。2) 各种
Object[]
需要涉及Object.compare()
方法,该方法可能非常复杂且昂贵。即使compare()
方法很简单,通常也会存在方法调用开销,除非可以内联整个排序方法2。因此,Object[]
通常会偏向于最小化总比较,即使以一些额外的算法复杂性为代价。另一方面,原始数组的排序只是直接比较原始值,通常需要一个或两个周期的数量级。在这种情况下,应该考虑比较成本和周围算法来优化算法,因为它们可能具有相同的量级。
0 至少对于 Java 7 和 Java 9 之间的版本来说是这样,当然这也包括 Oracle 的 JDK,因为它基于 Open JDK。其他实现很可能使用类似的方法,但我没有检查过。
1 对于字节数组,插入排序阈值实际上是 29 个元素,因为这是使用计数排序的下限。
2 这似乎不太可能,因为它相当大。
3 计数排序仅适用于 16 位或更少范围相对有限的值:
byte
、short
或char
代码>.Yes! ... and also no.
Summary
At the time of writing (2016), in current Open JDK 0 implementations Tim Sort is generally used for sorting arrays of objects (i.e.,
Arrays.sort(Object[])
and friends) - but for primitive arrays (the remainder of theArrays.sort
methods) a variety of other methods are used.For primitives, the heuristics choose among a variety of sorting methods such as quicksort, merge sort, counting sort3. depending on the data being sorted. Most of these decisions are simply made up-front based on the type and size of the array being sorted, but for
int
andlong
elements the decision is actually adaptive based on the measured sortedness of the array. So you have adaptation/introspection (the heuristics to pick an algorithm) on top of adaptation/introspection (TimSort or similar merge sort) in many cases!Details
Tim Sort is used for most sorts of objects, such as
Arrays.sort(Object[] a)
, unless the user has specifically requested the legacy behavior by setting system propertyjava.util.Arrays.useLegacyMergeSort
to true.For primitives, the situation is more complex. At least as of JDK 8 (version
1.8.0_111
) a variety of heurstics are used depending on the size of the arrays being sorted, the primitive type and the measured "sortedness" of the array. Here's an overview:DualPivotQuicksort.INSERTION_SORT_THRESHOLD
). This threshold is also used when sorting sub-arrays that arise when merge or quicksort are used and the size of the subarray falls below the threshold. So some form of insertion sort will be used in all sorts, and for small arrays it is the only algorithm used.byte
,short
andchar
, a counting sort is used for largish arrays. This is a simple sort that takesO(n + range)
time, whererange
is the total number of byte (256) or short/char (65536) values. The sort involves allocating an underlying array ofrange
values, so it is only used when the number of elements to sort is a significant fraction of the total range. In particular, it is used for byte arrays greater than 29 elements (i.e. ~11% of the range), and short/char arrays greater than 3200 elements (~5% of the range).int
andlong
arrays above the insertion sort threshold and forshort
/char
arrays both above the insertion sort threshold and below the counting sort threshold, one of two algorithms may be used: dual pivot quicksort, or merge sort. Which one is used depends on a measure of the sortedness of the array: the input is divided into runs of ascending or descending elements. If the number of such runs is greater than 66, then the array is considered mostly unsorted and is sorted with dual-pivot quicksort. Otherwise, the array is considered mostly sorted, and mergesort is used (using the already enumerated runs as a starting point).The idea of finding runs and then using mergesort to sort them is in fact very similar to TimSort, although there are some differences. So at least for some parameters, the JDK is using a run-aware mergesort, but for many other combinations of parameters it is using a different algorithm, and at least 5 distinct algorithms are used in total!
Rationale
The reasoning behind the different sort behavior of
Object[]
versus primitive is probably at least two-fold:1) Sorts of
Object[]
are required to be stable: objects which sort equally will appear in the same order as the input. For primitive arrays, no such concept exists: primitives are fully defined by their value, so there is no distinction between a stable and an unstable sort. This allows primitive sorts to dispense with the need for stable algorithms in favor of speed.2) Sorts of
Object[]
need to involve theObject.compare()
method, which may be arbitrarily complex and expensive. Even if thecompare()
method is simple, there will generally be method call overhead unless the entire sort method can be inlined2. So sorts ofObject[]
will generally be biased towards minimizing total comparisons, even at the cost of some additional algorithmic complexity.Sorts of primitive arrays, on the other hand, just directly compare primitive values which typically takes on the order of a cycle or two. In this case, the algorithm should be optimized considering both the cost of comparisons and the surrounding algorithm, since the are likely to be of the same magnitude.
0 At least for versions between Java 7 and Java 9, and of course this also includes Oracle's JDK as it is based on Open JDK. It is likely that other implementations use a similar approach, but I haven't checked.
1 For byte arrays, the insertion sort threshold is effectively 29 elements since that's the lower cutoff above which counting sort is used.
2 This seems unlikely, since it is quite large.
3 Counting sort is only used for values with relatively limited range of 16-bits or less:
byte
,short
orchar
.是的,Java 7 将使用 Timsort 进行 Arrays.sort。这是提交:
http://hg.openjdk.java.net/jdk7/jdk7/jdk /rev/bfd7abda8f79
Yes, Java 7 will use Timsort for Arrays.sort. Here is the commit:
http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/bfd7abda8f79