Java Collections.sort(nodes) 使用什么排序?
我认为是MergeSort,即O(n log n)。
但是,以下输出不同意:
-1,0000000099000391,0000000099000427
1,0000000099000427,0000000099000346
5,0000000099000391,0000000099000346
1,0000000099000427,0000000099000345
5,0000000099000391,0000000099000345
1,0000000099000346,0000000099000345
我正在按序列号对 4 个节点的节点列表进行排序,并且排序正在进行 6 次比较。 我很困惑,因为 6 > (4 对数(4))。 谁可以给我解释一下这个?
谢谢大家的回答。 谢谢汤姆纠正我的数学。
I think it is MergeSort, which is O(n log n).
However, the following output disagrees:
-1,0000000099000391,0000000099000427
1,0000000099000427,0000000099000346
5,0000000099000391,0000000099000346
1,0000000099000427,0000000099000345
5,0000000099000391,0000000099000345
1,0000000099000346,0000000099000345
I am sorting a nodelist of 4 nodes by sequence number, and the sort is doing 6 comparisons.
I am puzzled because 6 > (4 log(4)). Can someone explain this to me?
P.S. It is mergesort, but I still don't understand my results.
Thanks for the answers everyone. Thank you Tom for correcting my math.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
O(n log n) 并不意味着比较次数将等于或小于 n log n,只是所花费的时间将缩放与 n log n 成比例。 尝试使用 8 个节点、16 个节点或 32 个节点进行测试,并检查时间。
O(n log n) doesn't mean that the number of comparisons will be equal to or less than n log n, just that the time taken will scale proportionally to n log n. Try doing tests with 8 nodes, or 16 nodes, or 32 nodes, and checking out the timing.
你对四个节点进行了排序,所以你没有得到合并排序; 排序切换为插入排序。
根据维基百科关于合并排序的文章(已添加强调):
Arrays.sort 由集合类间接使用。
从 Java 7 开始,Java 的 Oracle 实现转而使用 timsort< /a> Python 使用的算法:JDK-6804124。
(上面链接的 timsort 专着非常值得一读。)
You sorted four nodes, so you didn't get merge sort; sort switched to insertion sort.
Per the Wikipedia article on merge sort (emphasis added):
Arrays.sort
is used indirectly by the collections classes.Starting with Java 7, the Oracle implementation of Java switched to use the timsort algorithm used by Python: JDK-6804124.
(The timsort monograph, linked above, is well worth reading.)
对于某个函数 f,如果存在两个严格正的常数 C_inf 和 C_sup,使得:
C_inf , 则处理数据量 n 的算法 A(n) 的时间复杂度为 O(f(n))。 f(n) < 预期值(操作计数(A(n)))< C_sup 。 f(n)
需要注意两件事:
实际常量 C 可以是任何值,并且确实取决于操作的相对成本(取决于语言、VM、体系结构或您的操作的实际定义)。 例如,在某些平台上,+ 和 * 具有相同的成本,而在其他平台上,后者的成本要慢一个数量级。
被称为“O(f(n))”的数量是一个预期操作计数,基于您正在处理的数据的某些可能任意模型。 例如,如果您的数据几乎完全排序,则合并排序算法将主要是 O(n),而不是 O(n . Log(n))。
An algorithm A(n) that processes an amount of data n is in O(f(n)), for some function f, if there exist two strictly positive constants C_inf and C_sup such that:
C_inf . f(n) < ExpectedValue(OperationCount(A(n))) < C_sup . f(n)
Two things to note:
The actual constants C could be anything, and do depend on the relative costs of operations (depending on the language, the VM, the architecture, or your actual definition of an operation). On some platforms, for instance, + and * have the same cost, on some other the later is an order of magnitude slower.
The quantity ascribed as "in O(f(n))" is an expected operation count, based on some probably arbitrary model of the data you are dealing with. For instance, if your data is almost completely sorted, a merge-sort algorithm is going to be mostly O(n), not O(n . Log(n)).
我写了一些您可能会对 Java 排序算法感兴趣的内容,并取得了一些性能Collections.sort() 的测量。 一旦子列表达到一定大小,目前的算法是带有插入排序的合并排序(注意这个算法很可能会在 Java 7 中改变)。
您确实应该将 Big O 表示法作为算法整体扩展方式的指示; 对于特定的排序,精确的时间将偏离此计算预测的时间(正如您将在我的图表中看到的那样,组合的两种排序算法各自具有不同的性能特征,因此排序的总时间是稍微复杂一点)。
也就是说,作为一个粗略的指导,每次将元素数量加倍时,如果将预期时间乘以 2.2,则不会相差太远。 (不过,对于一些元素的非常小的列表来说,这样做实际上没有多大意义。)
I've written some stuff you may be interested in about the Java sort algorithm and taken some performance measurements of Collections.sort(). The algorithm at present is a mergesort with an insertion sort once you get down to a certain size of sublists (N.B. this algorithm is very probably going to change in Java 7).
You should really take the Big O notation as an indication of how the algorithm will scale overall; for a particular sort, the precise time will deviate from the time predicted by this calculation (as you'll see on my graph, the two sort algorithms that are combined each have different performance characteristics, and so the overall time for a sort is a bit more complex).
That said, as a rough guide, for every time you double the number of elements, if you multiply the expected time by 2.2, you won't be far out. (It doesn't make much sense really to do this for very small lists of a few elements, though.)