何时使用 SortedList在 SortedDictionary上?

发布于 2024-08-03 18:41:33 字数 1497 浏览 4 评论 0原文

这可能看起来与这个问题重复,该问题询问“ SortedList排序字典?"不幸的是,答案只不过是引用 MSDN 文档(其中明确指出两者之间存在性能和内存使用差异),但实际上并没有回答问题。

事实上(所以这个问题没有得到相同的答案),根据 MSDN:

SortedList 泛型 class 是一个二叉搜索树 O(log n) 检索,其中 n 是 字典中的元素数量。 在这一点上,它类似于 SortedDictionary 泛型 班级。两个类有相似之处 对象模型,并且都有 O(log n) 检索。两个班级在哪里 不同之处在于内存使用和速度 插入和删除:

  • SortedList 使用较少 内存比 SortedDictionary.

  • SortedDictionary 有 更快的插入和移除 对未排序数据的操作,O(log n) 与 O(n) 相反 SortedList

  • 如果列表是一次性填充的 来自排序数据,SortedListSortedDictionary

因此,显然这表明 SortedList 是更好的选择除非您需要对未排序数据进行更快的插入和删除操作。

考虑到上述信息,问题仍然存在,使用 SortedDictionary 的实际(现实世界、业务案例等)原因是什么?根据性能信息,这意味着实际上根本不需要 SortedDictionary

This may appear to be a duplicate of this question, which asks "What’s the difference between SortedList and SortedDictionary?" Unfortunately, the answers do nothing more than quote the MSDN documentation (which clearly states that there are performance and memory use differences between the two) but don't actually answer the question.

In fact (and so this question doesn't get the same answers), according to MSDN:

The SortedList<TKey, TValue> generic
class is a binary search tree with
O(log n) retrieval, where n is the
number of elements in the dictionary.
In this, it is similar to the
SortedDictionary<TKey, TValue> generic
class. The two classes have similar
object models, and both have O(log n)
retrieval. Where the two classes
differ is in memory use and speed of
insertion and removal:

  • SortedList<TKey, TValue> uses less
    memory than SortedDictionary<TKey,
    TValue>
    .

  • SortedDictionary<TKey, TValue> has
    faster insertion and removal
    operations for unsorted data, O(log n)
    as opposed to O(n) for
    SortedList<TKey, TValue>.

  • If the list is populated all at once
    from sorted data, SortedList<TKey,
    TValue>
    is faster than
    SortedDictionary<TKey, TValue>.

So, clearly this would indicated that SortedList<TKey, TValue> is the better choice unless you need faster insert and remove operations for unsorted data.

The question still remains, given the information above what are the practical (real-world, business case, etc.) reasons for using a SortedDictionary<TKey, TValue>? Based on the performance information, it would imply that there really is no need to have SortedDictionary<TKey, TValue> at all.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

丢了幸福的猪 2024-08-10 18:41:33

我不确定 MSDN 文档关于 SortedListSortedDictionary 的准确性。似乎是说两者都是使用二叉搜索树实现的。但如果 SortedList 使用二叉搜索树,为什么它在添加时会比 SortedDictionary 慢得多?

无论如何,这是一些性能测试结果。

每个测试都在包含 10,000 个 int32 键的 SortedList / SortedDictionary 上运行。每个测试重复 1,000 次(发布构建、启动而不调试)。

第一组测试按从 0 到 9,999 的顺序添加密钥。第二组测试添加 0 到 9,999 之间的随机打乱密钥(每个数字只添加一次)。

***** Tests.PerformanceTests.SortedTest

SortedDictionary Add sorted: 4411 ms
SortedDictionary Get sorted: 2374 ms


SortedList Add sorted: 1422 ms
SortedList Get sorted: 1843 ms

***** Tests.PerformanceTests.UnsortedTest

SortedDictionary Add unsorted: 4640 ms
SortedDictionary Get unsorted: 2903 ms


SortedList Add unsorted: 36559 ms
SortedList Get unsorted: 2243 ms

与任何分析一样,重要的是相对性能,而不是实际数字。

正如您所看到的,对于排序数据,排序列表比 SortedDictionary 更快。对于未排序的数据,SortedList 的检索速度稍快,但添加速度大约慢 9 倍。

如果两者都在内部使用二叉树,那么对于 SortedList 来说,未排序数据上的 Add 操作要慢得多,这是非常令人惊讶的。排序列表也可能同时向排序线性数据结构添加项目,这会减慢速度。

但是,您希望 SortedList 的内存使用量等于或大于或至少等于 SortedDictionary。但这与 MSDN 文档所说的相矛盾。

I'm not sure how accurate the MSDN documentation is on SortedList and SortedDictionary. It seems to be saying both are implemented using a binary search tree. But if the SortedList uses a binary search tree, why would it be much slower on additions than SortedDictionary?

Anyway, here are some performance test results.

Each test operates on a SortedList / SortedDictionary containing 10,000 int32 keys. Each test is repeated 1,000 times (Release build, Start without Debugging).

The first group of tests add keys in sequence from 0 to 9,999. The second group of tests add random shuffled keys between 0 to 9,999 (every number is added exactly once).

***** Tests.PerformanceTests.SortedTest

SortedDictionary Add sorted: 4411 ms
SortedDictionary Get sorted: 2374 ms


SortedList Add sorted: 1422 ms
SortedList Get sorted: 1843 ms

***** Tests.PerformanceTests.UnsortedTest

SortedDictionary Add unsorted: 4640 ms
SortedDictionary Get unsorted: 2903 ms


SortedList Add unsorted: 36559 ms
SortedList Get unsorted: 2243 ms

As with any profiling, the important thing is the relative performance, not the actual numbers.

As you can see, on sorted data the sorted list is faster than the SortedDictionary. On unsorted data the SortedList is slightly quicker on retrieval, but about 9 times slower on adding.

If both are using binary trees internally, it is quite surprising that the Add operation on unsorted data is so much slower for SortedList. It is possible that sorted list may also be adding items to a sorted linear data structure at the same time, which would slow it down.

However, you would expect the memory usage of a SortedList to be equal or greater than or at least equal to a SortedDictionary. But this contradicts what the MSDN documentation says.

橘虞初梦 2024-08-10 18:41:33

我不知道为什么 MSDN 说 SortedList 使用二叉树来实现它,因为如果你使用像 Reflector 这样的反编译器查看代码,你就会意识到它的不正确。

SortedList 只是一个随时间增长的数组。

重新创建一个更大的数组,并将旧元素复制到其中(如 List

每次插入元素时,它首先检查数组是否有足够的容量,如果没有,则 使用二分搜索搜索位置插入元素(这是可能的,因为数组是可索引的并且已经排序)。

为了保持数组排序,它会将位于要插入的元素位置之后的所有元素移动(或推送)一个位置(使用Array.Copy())。

例如:

// we want to insert "3" 

2  
4  <= 3
5
8
9
.      
.      
.  

// we have to move some elements first

2
.  <= 3
4 
5  |
8  v
9
.
.

这解释了为什么当您插入未排序的元素时 SortedList 的性能如此糟糕。几乎每次插入都必须重新复制一些元素。唯一不需要这样做的情况是必须将元素插入到数组末尾时。

SortedDictionary 不同,它使用二叉树来插入和检索元素。它在插入时也有一些成本,因为有时需要重新平衡树(但不是每次插入)。

使用 SortedListSortedDictionary 搜索元素时,性能非常相似,因为它们都使用二分搜索。


在我看来,您永远不应该使用 SortedList 对数组进行排序。除非元素非常少,否则将值插入列表(或数组)然后调用 Sort() 方法总是会更快。

当您有一个已排序的值列表(例如:来自数据库),您希望保持其排序并执行一些可以利用它排序的操作时, SortedList 最有用(例如:SortedList 的 Contains() 方法执行二分搜索而不是线性搜索)

SortedDictionary 提供与 SortedList 相同的优点,但在以下情况下执行得更好要插入的值尚未排序。


编辑:如果您使用的是 .NET Framework 4.5,则 SortedDictionary 的替代方案是 SortedSet。它的工作方式与 SortedDictionary 相同,使用二叉树,但这里的键和值是相同的。

I don't know why MSDN says that SortedList<TKey, TValue> use a binary tree for its implementation because if you look at code with a decompiler like Reflector you realize its not true.

SortedList<TKey, TValue> is simply an array that grows over the time.

Every time you insert an element, it first check if the array has enough capacity, if not, a bigger array is recreated and old elements are copied into it (like List<T>)

After that, it searches where to insert the element, using a binary search (this is possible since the array is indexable and already sorted).

To keep the array sorted, it moves (or pushes) all the elements situated after position of element to be inserted by one position (using Array.Copy()).

Eg :

// we want to insert "3" 

2  
4  <= 3
5
8
9
.      
.      
.  

// we have to move some elements first

2
.  <= 3
4 
5  |
8  v
9
.
.

That explains why performance of SortedList is so bad when you insert unsorted elements. It has to re-copy some elements almost every insertion. The only case it has not to be done is when the element has to be inserted at the end of the array.

SortedDictionary<TKey, TValue> is different and use a binary tree to insert and retrieve elements. It also has some cost at insert because sometimes the tree need to be re-balanced (but not every insertion).

Performance is quite similar while searching an element with SortedList or SortedDictionary because they both use a binary search.


In my opinion, you should never use SortedList to just sort an array. Unless you have very few elements, it will always be faster to insert values into a list (or array) and then call Sort() method.

SortedList is mostly useful when you have a list of values already sorted (eg: from database), you want to keep it sorted and perform some operations that would take advantage it is sorted (eg: Contains() method of SortedList performs a binary search instead of linear search)

SortedDictionary offers same advantages than SortedList but performs better if values to insert are not already sorted.


EDIT : If you are using .NET Framework 4.5, an alternative to SortedDictionary<TKey, TValue> is SortedSet<T>. It works the same way as SortedDictionary, using a binary tree, but keys and values are the same here.

情魔剑神 2024-08-10 18:41:33

它们有两种不同的用途吗?

.NET 中这两种集合类型在语义上没有太大区别。它们都提供键控查找,并按键的排序顺序保留条目。在大多数情况下,您可以选择其中任何一个。也许唯一的区别是允许索引检索 SortedList

但是性能?

但是,性能存在差异,这可能是在它们之间进行选择的更重要的因素。这是它们渐近复杂性的表格视图。

+------------------+---------+----------+--------+----------+----------+---------+
| Collection       | Indexed | Keyed    | Value  | Addition |  Removal | Memory  |
|                  | lookup  | lookup   | lookup |          |          |         |
+------------------+---------+----------+--------+----------+----------+---------+
| SortedList       | O(1)    | O(log n) | O(n)   | O(n)*    | O(n)     | Lesser  |
| SortedDictionary | n/a     | O(log n) | O(n)   | O(log n) | O(log n) | Greater |
+------------------+---------+----------+--------+----------+----------+---------+

* Insertion is O(1) for data that are already in sort order, so that each 
  element is added to the end of the list (assuming no resize is required).

总结

时,您需要 SortedList

  1. 粗略地总结一下,当您需要索引查找
  2. 。内存开销越小越好。
  3. 您的输入数据已经排序(假设您已经从数据库订购了它)。

在以下情况下,您可能会更喜欢 SortedDictionary

  1. 相对整体性能很重要(相对于缩放)。
  2. 您的输入数据是无序的。

编写代码

SortedListSortedDictionary 均实现 IDictionary >,因此在代码中您可以从方法返回 IDictionary 或将变量声明为 IDictionary。基本上隐藏实现细节,并针对接口编写代码。

IDictionary<K, V> x = new SortedDictionary<K, V>(); //for eg. 

将来,如果您对某个集合的性能特征不满意,可以更轻松地从任一集合中进行切换。


有关这两种集合类型的更多信息,请参阅链接的原始问题

Are they meant for two different purposes?

There is not much semantic difference these two collection types in .NET make. They both offer keyed lookup as well as keep the entries in sort order of keys. In most cases you will be ok with either of them. Perhaps the only differentiator would be the indexed retrieval SortedList permits.

But performance?

However there is a performance difference which might be a stronger factor to choose between them. Here is a tabular view of their asymptotic complexity.

+------------------+---------+----------+--------+----------+----------+---------+
| Collection       | Indexed | Keyed    | Value  | Addition |  Removal | Memory  |
|                  | lookup  | lookup   | lookup |          |          |         |
+------------------+---------+----------+--------+----------+----------+---------+
| SortedList       | O(1)    | O(log n) | O(n)   | O(n)*    | O(n)     | Lesser  |
| SortedDictionary | n/a     | O(log n) | O(n)   | O(log n) | O(log n) | Greater |
+------------------+---------+----------+--------+----------+----------+---------+

* Insertion is O(1) for data that are already in sort order, so that each 
  element is added to the end of the list (assuming no resize is required).

Summary

To roughly summarize, you want a SortedList<K, V> when:

  1. you require indexed look-up.
  2. it's desirable to have lesser memory overhead.
  3. your input data is already sorted (say you get it already ordered from db).

You would instead want to prefer a SortedDictionary<K, V> when:

  1. relative overall performance matters (with respect to scaling).
  2. your input data is unordered.

Writing code

Both SortedList<K, V> and SortedDictionary<K, V> implement IDictionary<K, V>, so in your code you can return IDictionary<K, V> from the method or declare variable as IDictionary<K, V>. Basically hide the implementation detail, and code against interface.

IDictionary<K, V> x = new SortedDictionary<K, V>(); //for eg. 

In future, its easier to switch from either in case you're not happy with performance characteristic of one collection.


For more info on the two collection types see the original question linked.

静水深流 2024-08-10 18:41:33

性能差异的视觉表示。

在此处输入图像描述

Visual representation of performance differences.

enter image description here

安静被遗忘 2024-08-10 18:41:33

这就是全部内容了。键的检索是可比的,但字典的添加速度要快得多。

我尝试尽可能多地使用 SortedList,因为它允许我迭代键和值集合。据我所知,这对于 SortedDictionary 来说是不可能的。

我对此不确定,但据我所知,字典将数据存储在树结构中,而列表将数据存储在线性数组中。这就解释了为什么字典的插入和删除速度要快得多,因为需要移动的内存更少。它还解释了为什么您可以迭代 SortedLists 但不能迭代 SortedDictionary。

That's all there is to it. Retrieval of keys is comparable, but addition is much faster with Dictionaries.

I try to use SortedList as much as possible because it allows me to iterate over the keys and value collections. This is not possible with SortedDictionary as far as I know.

I'm not sure about this, but as far as I know Dictionaries store data in Tree structures, whereas List store data in linear arrays. That explains why insertion and removal is much faster with dictionaries, since less memory has to be shifted around. It also explains why you can iterate over SortedLists but not SortedDictionary.

你又不是我 2024-08-10 18:41:33

对我们来说,一个重要的考虑因素是,我们通常拥有小型字典(<100 个元素),并且当前的处理器在访问顺序内存时速度更快,同时执行一些难以预测的分支。 (即迭代线性数组而不是遍历树)
因此,当字典中的元素少于 60 个时,SortedList<>在许多用例中通常是最快且内存效率最高的字典。

An important consideration for us is the fact that we often have small dictionaries (<100 elements), and current processessors much faster at accessing sequential memory while performing few difficult to predict branches. (i.e. iterating over a linear array rather than traversing a tree)
So when you have less than about 60 elements in your dictionary, SortedList<> is often the fastest and most memory efficient dictionary in many use cases.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文