Java - Collections.sort() 性能
我正在使用 Collections.sort() 对元素实现 Comparable 接口的 LinkedList 进行排序,因此它们按自然顺序排序。在 javadoc 文档中,该方法使用 mergesort 算法,该算法具有 n*log(n) 性能。
我的问题是是否有更有效的算法来对我的 LinkedList 进行排序?
该列表的大小可能非常大,并且排序也将非常频繁。
I'm using Collections.sort() to sort a LinkedList whose elements implements Comparable interface, so they are sorted in a natural order. In the javadoc documentation its said this method uses mergesort algorithm which has n*log(n) performance.
My question is if there is a more efficient algorithm to sort my LinkedList?
The size of that list could be very high and sort will be also very frequent.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
O(N log N)
渐进性非常好。也就是说,存在线性时间O(N)
非比较排序,例如计数排序和桶排序。例如,当您要对数百万个整数进行排序,但它们在 1..10 之间时,这很有用。此外,如果列表“几乎排序”,则在某些情况下,二次插入排序实际上会更好。
这是否适用,甚至是否值得实施,取决于您的分析结果。我想说,除非它表明这种类型是瓶颈,否则不要担心。
另请参阅
相关问题
O(N log N)
is very good asymptotically. That said, there are linear timeO(N)
non-comparison based sort, e.g. counting sort and bucket sort. This is useful when, e.g. you're sorting millions and millions of integers, but they're between 1..10.Also, if the list is "almost sorted", the otherwise quadratic insertion sort is reported to actually be better under some scenarios.
Whether or not this is applicable, or even worth to implement, depends on your profiling results. I'd say that unless it shows the sort to be a bottleneck, don't worry about it.
See also
Related questions
如果您说列表将“非常频繁”地排序,那么您应该考虑始终以排序状态保存列表,例如使用树而不是
LinkedList
。 也许如果您没有任何重复的值并且不需要任何列表操作,您甚至可以使用一些SortedSet
而不是List
(因为你一直在对它们进行排序)。检查TreeSet
SortedSet
实现的类。如果你想迭代这个“列表”(实际上是一个集合),你可以使用该类的迭代器。
如果列表中有重复的值,则必须使用一些技巧(例如将值放入一个新类中,该类还具有一些用于对相等对象进行排序的增量)
If you say the list will be sorted "very frequent", you should consider holding the list in a sorted stated all the time, like using a tree instead of a
LinkedList
. Maybe you can even use someSortedSet
instead of aList
, if you don't have any duplicated values and don't need any List operations (as you are sorting them anyway all the time). Check theTreeSet
class of theSortedSet
implementation.If you want to iterate over this "list" (which is actually a Set) you can use the Iterator of the class.
If you have duplicate values inside the List you have to use some tricks (like putting the value in a new class which also got some delta for sorting equal object)
没有比 n*log(n) 更好的通用排序算法。而且这个速度相当快。一般来说,我的意思是您的数据没有特殊属性。
There is no general sort algorithm better than
n*log(n)
. And this is quite fast. By general I mean your data doesn't have special properties.我正在试验大型数据集(GB 的数据)并实现了合并排序(有一个很好的例子@googlecode)。然而,我使用 Collection.sort() 来预先排序我的临时缓冲区,根据我的经验,Collection.sort() 在一定的数据阈值下会变得非常慢。使用 96MB 的辅助缓冲区,我可以在大约 30 秒内对其中一个缓冲区进行排序(注意:这在很大程度上取决于您使用的比较器 - 我使用带有相当复杂的列解析器的自定义列布局),但是将其增加到 128MB 块大小时间跳到了3分钟多。这与我可以观察到的较小块的线性(或接近线性)行为无关。这具有如此大的影响,以至于在几乎(?)所有情况下,具有较小缓冲区的合并排序都比使用 128MB 缓冲区的内存排序更快。简而言之:合并排序是处理超过 100MB 边界的大型数据集的方法。我无法真正回答为什么会这样,这些数字甚至可能与机器相关(我的是 2.6GHz i7 和 16GB 内存的 OS-X)。
I am experimenting with large data sets (GBs of data) and have implemented a merge sort (there is a good example @ googlecode). However, I am using Collection.sort() to pre-sort my temporary buffers and in my experience Collection.sort() gets ridiculously slow at a certain threshold of data. With a auxiliary buffer of 96MB I can sort one of those buffers in about 30sec (note: this heavily depends on the comparators you use - I use a custom column layout with a quite complex column parser), however increasing this to a 128MB chunk size the time jumps to over 3 minutes. This is in no relation to the linear (or near linear) behavior I can observe for smaller chunks. This has so much impact, that a merge sort with smaller buffers in almost (?) all cases faster than a in memory sort using a 128MB buffer. To make this short: Merge sort is the way to go for large data sets beyond the 100MB boundary. I cannot really answer why that is, and those numbers might even be machine dependent (mine is a OS-X on a 2.6GHz i7 & 16GB memory).
在对列表进行排序方面,不,所有基于一般数据的比较排序都是 O(N log(N))。
如果您的排序是由于插入造成的,那么您可以尝试对插入进行批处理,然后与主列表进行合并排序 - 如果您有 B 个新项目,则可以在 O(B log(B)) 中对它们进行排序,然后进行单级合并两个列表的复杂度为 O(N+B)。
如果您的排序是由于项目值的更改而导致的,那么如果您将可变值更改为不可变值并将更改视为一批插入和删除,则可能可以执行类似的批处理。否则,您将无法避免对整个列表进行排序。
如果您的要求允许,那么可以使用各种非链表结构(例如 TreeSet),它们可以更有效地维护排序顺序,但如果值可变,则会失败。
In terms of sorting the list, no, all comparison based sorts on general data are O(N log(N)).
If your resorting is due to insertions, then you can try to batch your insertions and then merge sort with the main list - if you have B new items, you sort them in O(B log(B)) then do a single level merge of the two lists which is O(N+B).
If your resorting is due to changes in the values of the items, you might be able to do a similar batching if you change the mutable values into immutable ones and treat the changes to be a batch of insertions and deletions. Otherwise, you won't be able to avoid sorting the whole list.
If your requirements allow it, then there are various non-linked-list structures such as TreeSet available which maintain a sorted order more efficiently, but will fail if the values are mutable.