何时使用 SortedList在 SortedDictionary上?
这可能看起来与这个问题重复,该问题询问“ SortedList 和 排序字典?"不幸的是,答案只不过是引用 MSDN 文档(其中明确指出两者之间存在性能和内存使用差异),但实际上并没有回答问题。
事实上(所以这个问题没有得到相同的答案),根据 MSDN:
SortedList
泛型 class 是一个二叉搜索树 O(log n) 检索,其中 n 是 字典中的元素数量。 在这一点上,它类似于SortedDictionary
泛型 班级。两个类有相似之处 对象模型,并且都有 O(log n) 检索。两个班级在哪里 不同之处在于内存使用和速度 插入和删除:
SortedList
使用较少 内存比SortedDictionary
.
SortedDictionary
有 更快的插入和移除 对未排序数据的操作,O(log n) 与 O(n) 相反SortedList
。如果列表是一次性填充的 来自排序数据,
SortedList
比SortedDictionary
。
因此,显然这表明 SortedList
是更好的选择除非您需要对未排序数据进行更快的插入和删除操作。
考虑到上述信息,问题仍然存在,使用 SortedDictionary
的实际(现实世界、业务案例等)原因是什么?根据性能信息,这意味着实际上根本不需要 SortedDictionary
。
This may appear to be a duplicate of this question, which asks "What’s the difference between SortedList and SortedDictionary?" Unfortunately, the answers do nothing more than quote the MSDN documentation (which clearly states that there are performance and memory use differences between the two) but don't actually answer the question.
In fact (and so this question doesn't get the same answers), according to MSDN:
The
SortedList<TKey, TValue>
generic
class is a binary search tree with
O(log n) retrieval, where n is the
number of elements in the dictionary.
In this, it is similar to theSortedDictionary<TKey, TValue>
generic
class. The two classes have similar
object models, and both have O(log n)
retrieval. Where the two classes
differ is in memory use and speed of
insertion and removal:
SortedList<TKey, TValue>
uses less
memory thanSortedDictionary<TKey,
.
TValue>
SortedDictionary<TKey, TValue>
has
faster insertion and removal
operations for unsorted data, O(log n)
as opposed to O(n) forSortedList<TKey, TValue>
.If the list is populated all at once
from sorted data,SortedList<TKey,
is faster than
TValue>SortedDictionary<TKey, TValue>
.
So, clearly this would indicated that SortedList<TKey, TValue>
is the better choice unless you need faster insert and remove operations for unsorted data.
The question still remains, given the information above what are the practical (real-world, business case, etc.) reasons for using a SortedDictionary<TKey, TValue>
? Based on the performance information, it would imply that there really is no need to have SortedDictionary<TKey, TValue>
at all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我不确定 MSDN 文档关于
SortedList
和SortedDictionary
的准确性。似乎是说两者都是使用二叉搜索树实现的。但如果 SortedList 使用二叉搜索树,为什么它在添加时会比SortedDictionary
慢得多?无论如何,这是一些性能测试结果。
每个测试都在包含 10,000 个 int32 键的
SortedList
/SortedDictionary
上运行。每个测试重复 1,000 次(发布构建、启动而不调试)。第一组测试按从 0 到 9,999 的顺序添加密钥。第二组测试添加 0 到 9,999 之间的随机打乱密钥(每个数字只添加一次)。
与任何分析一样,重要的是相对性能,而不是实际数字。
正如您所看到的,对于排序数据,排序列表比
SortedDictionary
更快。对于未排序的数据,SortedList
的检索速度稍快,但添加速度大约慢 9 倍。如果两者都在内部使用二叉树,那么对于
SortedList
来说,未排序数据上的 Add 操作要慢得多,这是非常令人惊讶的。排序列表也可能同时向排序线性数据结构添加项目,这会减慢速度。但是,您希望
SortedList
的内存使用量等于或大于或至少等于SortedDictionary
。但这与 MSDN 文档所说的相矛盾。I'm not sure how accurate the MSDN documentation is on
SortedList
andSortedDictionary
. It seems to be saying both are implemented using a binary search tree. But if the SortedList uses a binary search tree, why would it be much slower on additions thanSortedDictionary
?Anyway, here are some performance test results.
Each test operates on a
SortedList
/SortedDictionary
containing 10,000 int32 keys. Each test is repeated 1,000 times (Release build, Start without Debugging).The first group of tests add keys in sequence from 0 to 9,999. The second group of tests add random shuffled keys between 0 to 9,999 (every number is added exactly once).
As with any profiling, the important thing is the relative performance, not the actual numbers.
As you can see, on sorted data the sorted list is faster than the
SortedDictionary
. On unsorted data theSortedList
is slightly quicker on retrieval, but about 9 times slower on adding.If both are using binary trees internally, it is quite surprising that the Add operation on unsorted data is so much slower for
SortedList
. It is possible that sorted list may also be adding items to a sorted linear data structure at the same time, which would slow it down.However, you would expect the memory usage of a
SortedList
to be equal or greater than or at least equal to aSortedDictionary
. But this contradicts what the MSDN documentation says.我不知道为什么 MSDN 说
SortedList
使用二叉树来实现它,因为如果你使用像Reflector
这样的反编译器查看代码,你就会意识到它的不正确。SortedList
只是一个随时间增长的数组。重新创建一个更大的数组,并将旧元素复制到其中(如
List
)每次插入元素时,它首先检查数组是否有足够的容量,如果没有,则 使用二分搜索搜索位置插入元素(这是可能的,因为数组是可索引的并且已经排序)。
为了保持数组排序,它会将位于要插入的元素位置之后的所有元素移动(或推送)一个位置(使用
Array.Copy()
)。例如:
这解释了为什么当您插入未排序的元素时
SortedList
的性能如此糟糕。几乎每次插入都必须重新复制一些元素。唯一不需要这样做的情况是必须将元素插入到数组末尾时。SortedDictionary
不同,它使用二叉树来插入和检索元素。它在插入时也有一些成本,因为有时需要重新平衡树(但不是每次插入)。使用
SortedList
或SortedDictionary
搜索元素时,性能非常相似,因为它们都使用二分搜索。在我看来,您永远不应该使用
SortedList
对数组进行排序。除非元素非常少,否则将值插入列表(或数组)然后调用Sort()
方法总是会更快。当您有一个已排序的值列表(例如:来自数据库),您希望保持其排序并执行一些可以利用它排序的操作时,
SortedList
最有用(例如:
方法执行二分搜索而不是线性搜索)SortedList
的 Contains()SortedDictionary
提供与SortedList
相同的优点,但在以下情况下执行得更好要插入的值尚未排序。编辑:如果您使用的是 .NET Framework 4.5,则
SortedDictionary
的替代方案是SortedSet
。它的工作方式与SortedDictionary
相同,使用二叉树,但这里的键和值是相同的。I don't know why MSDN says that
SortedList<TKey, TValue>
use a binary tree for its implementation because if you look at code with a decompiler likeReflector
you realize its not true.SortedList<TKey, TValue>
is simply an array that grows over the time.Every time you insert an element, it first check if the array has enough capacity, if not, a bigger array is recreated and old elements are copied into it (like
List<T>
)After that, it searches where to insert the element, using a binary search (this is possible since the array is indexable and already sorted).
To keep the array sorted, it moves (or pushes) all the elements situated after position of element to be inserted by one position (using
Array.Copy()
).Eg :
That explains why performance of
SortedList
is so bad when you insert unsorted elements. It has to re-copy some elements almost every insertion. The only case it has not to be done is when the element has to be inserted at the end of the array.SortedDictionary<TKey, TValue>
is different and use a binary tree to insert and retrieve elements. It also has some cost at insert because sometimes the tree need to be re-balanced (but not every insertion).Performance is quite similar while searching an element with
SortedList
orSortedDictionary
because they both use a binary search.In my opinion, you should never use
SortedList
to just sort an array. Unless you have very few elements, it will always be faster to insert values into a list (or array) and then callSort()
method.SortedList
is mostly useful when you have a list of values already sorted (eg: from database), you want to keep it sorted and perform some operations that would take advantage it is sorted (eg:Contains()
method ofSortedList
performs a binary search instead of linear search)SortedDictionary
offers same advantages thanSortedList
but performs better if values to insert are not already sorted.EDIT : If you are using .NET Framework 4.5, an alternative to
SortedDictionary<TKey, TValue>
isSortedSet<T>
. It works the same way asSortedDictionary
, using a binary tree, but keys and values are the same here.它们有两种不同的用途吗?
.NET 中这两种集合类型在语义上没有太大区别。它们都提供键控查找,并按键的排序顺序保留条目。在大多数情况下,您可以选择其中任何一个。也许唯一的区别是允许索引检索
SortedList
。但是性能?
但是,性能存在差异,这可能是在它们之间进行选择的更重要的因素。这是它们渐近复杂性的表格视图。
总结
时,您需要
SortedList
在以下情况下,您可能会更喜欢
SortedDictionary
:编写代码
SortedList
和SortedDictionary
均实现IDictionary
>,因此在代码中您可以从方法返回IDictionary
或将变量声明为IDictionary
。基本上隐藏实现细节,并针对接口编写代码。将来,如果您对某个集合的性能特征不满意,可以更轻松地从任一集合中进行切换。
有关这两种集合类型的更多信息,请参阅链接的原始问题。
Are they meant for two different purposes?
There is not much semantic difference these two collection types in .NET make. They both offer keyed lookup as well as keep the entries in sort order of keys. In most cases you will be ok with either of them. Perhaps the only differentiator would be the indexed retrieval
SortedList
permits.But performance?
However there is a performance difference which might be a stronger factor to choose between them. Here is a tabular view of their asymptotic complexity.
Summary
To roughly summarize, you want a
SortedList<K, V>
when:You would instead want to prefer a
SortedDictionary<K, V>
when:Writing code
Both
SortedList<K, V>
andSortedDictionary<K, V>
implementIDictionary<K, V>
, so in your code you can returnIDictionary<K, V>
from the method or declare variable asIDictionary<K, V>
. Basically hide the implementation detail, and code against interface.In future, its easier to switch from either in case you're not happy with performance characteristic of one collection.
For more info on the two collection types see the original question linked.
性能差异的视觉表示。
Visual representation of performance differences.
这就是全部内容了。键的检索是可比的,但字典的添加速度要快得多。
我尝试尽可能多地使用 SortedList,因为它允许我迭代键和值集合。据我所知,这对于 SortedDictionary 来说是不可能的。
我对此不确定,但据我所知,字典将数据存储在树结构中,而列表将数据存储在线性数组中。这就解释了为什么字典的插入和删除速度要快得多,因为需要移动的内存更少。它还解释了为什么您可以迭代 SortedLists 但不能迭代 SortedDictionary。
That's all there is to it. Retrieval of keys is comparable, but addition is much faster with Dictionaries.
I try to use SortedList as much as possible because it allows me to iterate over the keys and value collections. This is not possible with SortedDictionary as far as I know.
I'm not sure about this, but as far as I know Dictionaries store data in Tree structures, whereas List store data in linear arrays. That explains why insertion and removal is much faster with dictionaries, since less memory has to be shifted around. It also explains why you can iterate over SortedLists but not SortedDictionary.
对我们来说,一个重要的考虑因素是,我们通常拥有小型字典(<100 个元素),并且当前的处理器在访问顺序内存时速度更快,同时执行一些难以预测的分支。 (即迭代线性数组而不是遍历树)
因此,当字典中的元素少于 60 个时,SortedList<>在许多用例中通常是最快且内存效率最高的字典。
An important consideration for us is the fact that we often have small dictionaries (<100 elements), and current processessors much faster at accessing sequential memory while performing few difficult to predict branches. (i.e. iterating over a linear array rather than traversing a tree)
So when you have less than about 60 elements in your dictionary, SortedList<> is often the fastest and most memory efficient dictionary in many use cases.