内置 .NET 集合排序器的性能
有人问如何对列表进行排序。从基本的 List.Sort() 到 List.OrderBy() 给出了多种方法。最可笑的是自行选择排序。我立即投了否决票,但这让我思考;应用于列表的 Linq 的 OrderBy() 不会做同样的事情吗? myList.OrderBy(x=>x.Property).ToList() 将生成一个迭代器,该迭代器基本上找到集合剩余部分中投影的最小值,并yield 返回它。当浏览整个列表时,这就是选择排序。
这让我想到;列表、排序列表、枚举等的内置排序器使用哪些算法?通过扩展,对于大型集合是否应该避免使用其中任何算法? SortedList 由于按键排序,因此可能会在每次添加时使用单遍插入排序;找到第一个值大于新索引的索引,并将其插入到它之前。列表和数组本身的合并排序可能非常有效,但我不知道 Sort() 背后的实际算法。我们已经讨论过 OrderBy。
我上面所知道的似乎表明 List.Sort() 或 Array.Sort() 是已知大小的列表的最佳选择,并且不鼓励使用 Linq 对内存中的列表或数组进行排序。对于流来说,除了 OrderBy() 之外,确实没有其他方法可以枚举;您可以将数据保留为流,而不必在排序之前获取所有数据,从而减轻了性能损失。
编辑:
普遍的共识是,考虑到列表或数组的具体实现,Sort() 速度更快。 OrderBy 是合理的,但速度较慢,因为它增加了从传递的可枚举中提取数组的 O(N) 复杂性。由于底层的原因,SortedList 初始化最终的复杂度为 O(N^2)。这个故事的寓意是,当您有实际的列表时,请使用 List.Sort() 而不是 List.OrderBy()。
There was a question asked about how to sort a List. There were several methods given from the basic List.Sort() to List.OrderBy(). The most laughable was a roll-your-own-SelectionSort. I promptly voted that down, but it made me think; wouldn't Linq's OrderBy(), applied to a list, do the same thing? myList.OrderBy(x=>x.Property).ToList() would produce an iterator that basically finds the minimum value of the projection in what's left of the collection and yield returns it. When going through the entire list, that's a selection sort.
Which made me think; what algorithms do the built-in sorters for Lists, SortedLists, Enumerables, etc. use, and by extension, should any of them be avoided for large collections? A SortedList, as it stays sorted by key, would probably use a single-pass InsertionSort on each add; find the first index with a value greater than the new one, and insert before it. Lists and Arrays probably MergeSort themselves pretty efficiently, but I don't know the actual algorithm behind Sort(). We've discussed OrderBy.
What I know above would seem to indicate that List.Sort() or Array.Sort() are the best options for a list of known size, and using Linq to sort an in-memory list or array should be discouraged. For a stream, there really isn't any other way then to OrderBy() the enumerable; the performance loss is mitigated by the fact that you can keep the data as a stream instead of having to have it all before sorting it.
EDIT:
The general consensus is that Sort() is faster given a concrete implementation of a List or Array. OrderBy is reasonable but slower because it adds O(N) complexity of extracting an array from the passed enumerable. SortedList initialization ends up being O(N^2) because of what's under the hood. Moral of the story, use List.Sort() instead of List.OrderBy() when you have an actual List.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Enumerable.OrderBy() 吞掉了 IEnumerable<>放入数组并使用快速排序。 O(n) 存储要求。它是由 System.Core.dll 中的内部类
EnumerableSort.QuickSort()
完成的。存储成本使其与简单地对列表进行排序(如果有的话)相比没有竞争力,因为 List<>就地排序。 Linq 通常通过使用 is 运算符检查 IEnumerable 的真实功能来进行优化。在这里不起作用,因为 List<>.Sort 具有破坏性。List<>.Sort 和 Array.Sort 使用就地快速排序。
排序列表<>插入的复杂度为 O(n),主导了查找插入点的 O(log(n)) 复杂度。因此将 N 个未排序的项目放入其中将花费 O(n^2)。排序字典<>使用红黑树,插入复杂度为 O(log(n))。因此 O(nlog(n)) 来填充它,与摊销快速排序相同。
Enumerable.OrderBy() slurps the IEnumerable<> into an array and uses quick sort. O(n) storage requirements. It's done by an internal class in System.Core.dll,
EnumerableSort<TElement>.QuickSort()
. The storage cost makes it uncompetitive with simply sorting the list, if you have one, since List<> sorts in-place. Linq often optimizes by checking the true capabilities of the IEnumerable with the is operator. Won't work here since List<>.Sort is destructive.List<>.Sort and Array.Sort use in-place quick sort.
SortedList<> has O(n) complexity for an insertion, dominating the O(log(n)) complexity of finding the insertion point. So putting N unsorted items into it will cost O(n^2). SortedDictionary<> uses a red-black tree, giving insert O(log(n)) complexity. Thus O(nlog(n)) to fill it, same as amortized quick sort.
通过反射器快速浏览告诉我列表排序方法利用快速排序 http://en.wikipedia.org/wiki /Quicksort 通过 System.Collections.Generic.GenericArraySortHelper
SortedList 使用 Array.BinarySearch 来确定在每个上插入内容的位置 添加
枚举器没有排序逻辑
快速排序对于大多数情况来说是一个很好的排序选择,尽管它可以接近 O (n^2) 如果你的输入数据真的不走运。
如果您怀疑您的输入数据是一大堆数据,且顺序不吉利(已排序),无法进行快速排序,那么一个技巧是首先对数据进行随机化(这总是很便宜),然后对随机数据。快速排序算法可以实现一些技巧来缓解对已排序(或接近排序)输入数据进行排序的问题,我不知道 BCL 实现是否执行其中任何操作。
A quick gander through reflector tells me that List Sort methods utilize quicksort http://en.wikipedia.org/wiki/Quicksort through System.Collections.Generic.GenericArraySortHelper
SortedList uses Array.BinarySearch to figure out where to insert stuff on each Add
Enumerators don't have sorting logic
Quicksort is a good sorting choice for most situations though it can approach O(n^2) if you're really unlucky with the input data.
If you suspect your input data to be a huge pile of data in an unlucky (already sorted) order for quicksort a trick is to randomize the data first (which is always cheap) and then do the sorting on the randomized data. There are a few tricks the quicksort algorithm can implement to mitigate the problem of sorting already sorted (or nearly sorted) input data, I don't know whether the BCL implementation does any of these.
是的,你的假设听起来是正确的。我做了一个小测试来证实这一点。
对于 5000000 个整数,
Yes, your assumptions sound right. I did a little test to confirm it.
On 5000000 integers,
了解每种方法性能的一种方法是对其进行测量:
结果:
这表明,即使对于非常大的列表,OrderBy 的性能也是合理的,但它是不如在列表上使用内置排序方法那么快。这可能是因为 OrderBy 的代码稍微灵活一些 - 它需要一个必须对每个元素进行评估的键选择器。
One way to find out the performance of each method is to measure it:
Result:
This shows that the performance of OrderBy is reasonable even for very large lists, but it's not quite as fast as using the built-in Sort method on a list. This is probably because the code for OrderBy is slightly more flexible - it takes a key selector which must be evaluated for each element.