为什么 Java 6 Arrays#sort(Object[]) 对于小数组从合并排序更改为插入排序?
如果数组长度小于某个阈值,Arrays.java
中 Java 6 的合并排序实现将使用插入排序。该值被硬编码为 7。由于该算法是递归的,因此对于大型数组来说,这种情况最终会发生多次。规范的合并排序算法不会这样做,只是一直使用合并排序向下直到列表中只有 1 个元素。
这是一种优化吗?如果是这样,它应该如何提供帮助?为什么是7
?插入排序(甚至是 <=7
的事物)会显着增加对大型数组进行排序所需的比较次数 - 因此会增加 compareTo()
的排序成本通话速度很慢。
(x 轴是数组的大小< /code>,y 轴是
比较次数
,针对 INSERTIONSORT_THRESHOLD
的不同值)
Java 6's mergesort implementation in Arrays.java
uses an insertion-sort if the array length is less than some threshold. This value is hard-coded to 7. As the algorithm is recursive, this eventually happens many times for a large array. The canonical merge-sort algorithm does not do this, just using merge-sort all the way down until there is only 1 element in the list.
Is this an optimisation? If so, how is it supposed to help? And why 7
? The insertion sort (even of <=7
things) increases the number of comparisons required to sort a large array dramatically - so will add cost to a sort where compareTo()
calls are slow.
(x-axis is size of array
, y-axis is # of comparisons
, for different values of INSERTIONSORT_THRESHOLD
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,这是故意的。虽然归并排序的 Big-O 小于插入排序等二次排序的 Big-O,但它执行的操作更复杂,因此速度更慢。
考虑对长度为 8 的数组进行排序。除了 7 次合并操作之外,合并排序还对其自身进行约 14 次递归调用。每个递归调用都会给运行时带来一些不小的开销。每个合并操作都涉及一个循环,其中必须对索引变量进行初始化、递增和比较,必须复制临时数组等。总而言之,您可以预期超过 300 个“简单”操作。
另一方面,插入排序本质上很简单,并且使用大约 8^2=64 次操作,速度要快得多。
这样想吧。当你手动对 10 个数字的列表进行排序时,你会使用合并排序吗?不,因为你的大脑更擅长做简单的事情,比如插入排序。然而,如果我给你一年的时间来对 100,000 个数字的列表进行排序,你可能会更倾向于对其进行合并排序。
至于幻数7,根据经验得出是最佳的。
编辑:在 8 个元素的标准插入排序中,最坏的情况会导致 ~36 次比较。在规范的合并排序中,大约有 24 次比较。加上方法调用的开销和操作的复杂性,插入排序应该更快。此外,如果考虑平均情况,插入排序的比较次数将远少于 36 次。
Yes this is intentional. While the Big-O of mergesort is less than that of quadratic sorts such as insertion sort, the operations it does are more complex and thus slower.
Consider sorting an array of length 8. Merge sort makes ~14 recursive calls to itself in addition to 7 merge operations. Each recursive call contributes some non-trivial overhead to the run-time. Each merge operation involves a loop where index variables must be initialized, incremented, and compared, temporary arrays must be copied, etc. All in all, you can expect well over 300 "simple" operations.
On the other hand, insertion sort is inherently simple and uses about 8^2=64 operations which is much faster.
Think about it this way. When you sort a list of 10 numbers by hand, do you use merge sort? No, because your brain is much better at doing simple things like like insertion sort. However if I gave you a year to sort a list of 100,000 numbers, you might be more inclined to merge sort it.
As for the magic number 7, it is empirically derived to be optimal.
EDIT: In a standard insertion sort of 8 elements, the worst case scenario leads to ~36 comparisons. In a canonical merge sort, you have ~24 comparisons. Adding in the overhead from the method calls and complexity of operations, insertion sort should be faster. Additionally if you look at the average case, insertion sort would make far fewer comparisons than 36.
插入排序为 n(n-1)/2,合并排序为 n*(以 2 为底的 log n)。
考虑到这一点 -
从上面的数据可以清楚地看出,直到长度 6,插入排序更快,在 7 之后,合并排序效率更高。
这就解释了为什么使用 7。
Insertion sort is n(n-1)/2 and merge sort is n*(log n with base 2 ).
Considering this -
From above data it is clear, till length 6, insetion sort is faster and after 7, merge sort is efficient.
That explains why 7 is used.
我的理解是,这是一个根据经验得出的值,其中插入排序所需的时间实际上较低,尽管所需的比较次数(可能)较多。之所以如此,是因为在合并排序即将结束时,数据可能几乎已排序,这使得插入排序表现良好。
My understanding is that this is an empirically derived value, where the time required for an insertion sort is actually lower, despite a (possible) higher number of comparisons required. This is so because near the end of a mergesort, the data is likely to be almost sorted, which makes insertion sort perform well.