哈希集与字典相对查找某项是否存在的搜索时间
HashSet<T> t = new HashSet<T>();
// add 10 million items
Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.
谁的 .Contains
方法返回速度更快?
澄清一下,我的要求是我有 1000 万个对象(实际上是字符串),我需要检查它们是否存在于数据结构中。我永远不会重复。
HashSet<T> t = new HashSet<T>();
// add 10 million items
Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.
Whose .Contains
method will return quicker?
Just to clarify, my requirement is I have 10 million objects (well, strings really) that I need to check if they exist in the data structure. I will NEVER iterate.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
HashSet、List、Dictionary 性能测试,取自此处。
添加 1000000 个对象(不检查重复项)
包含检查对象的一半10000 个集合
删除 10000 个集合中的一半对象
HashSet vs List vs Dictionary performance test, taken from here.
Add 1000000 objects (without checking duplicates)
Contains check for half the objects of a collection of 10000
Remove half the objects of a collection of 10000
我假设您的意思是第二种情况下的
Dictionary
?HashTable
是一个非泛型类。您应该根据您的实际需求选择适合工作的集合。您真的想要将每个键映射到一个值吗?如果是这样,请使用
Dictionary<,>
。如果您只关心它作为一个集合,请使用HashSet<>
。我希望
HashSet.Contains
和Dictionary.ContainsKey
(这是类似的操作,假设您明智地使用字典)基本上执行相同的操作 - 从根本上讲,他们使用相同的算法。我猜想,由于Dictionary<,>
中的条目较大,因此使用Dictionary<,>
比使用HashSet
破坏缓存的可能性更大;>
,但我认为与仅仅根据您想要实现的目标选择错误数据类型的痛苦相比,这微不足道。I assume you mean
Dictionary<TKey, TValue>
in the second case?HashTable
is a non-generic class.You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use
Dictionary<,>
. If you only care about it as a set, useHashSet<>
.I would expect
HashSet<T>.Contains
andDictionary<TKey, TValue>.ContainsKey
(which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries inDictionary<,>
being larger you end up with a greater likelihood of blowing the cache withDictionary<,>
than withHashSet<>
, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.来自 字典的 MSDN 文档< TKey,TValue>:
附注:
From the MSDN documentation for Dictionary<TKey,TValue>:
With a note:
该问题接受的答案并不能有效回答该问题!它恰好给出了正确的答案,但他们提供的证据并未显示该答案。
该答案表明,在
Dictionary
或HashSet
上进行键查找比在List
中查找要快得多。这是事实,但并不有趣,也不令人惊讶,也不能证明它们具有相同的速度。我运行了下面的代码来比较查找时间,我的结论是它们实际上是相同的速度。 (或者至少,如果有任何差异,那么差异完全在该速度的标准偏差之内)
具体来说,在本次测试中,对于我来说,100,000,000 次查找花费的时间在 10 到 11.5 秒之间。
测试代码:
The accepted answer to this question does NOT validly answer the question! It happens to give the correct answer, but that answer isn't shown by the evidence they provided.
What that answer shows is that Key lookups on a
Dictionary
orHashSet
are vastly quicker than looking up in aList
. Which is true, but not interesting, nor surprising, nor proof that they have the same speed.I've run the code below to compare the lookup times, and my conclusion is that they ARE in fact the same speed. (Or at least, if there is any difference, then the difference is well within the Standard Deviation of that speed)
Specifically, 100,000,000 lookups was taking between 10 and 11.5 seconds for both, for me, in this test.
Test Code:
这些是不同的数据结构。此外,
HashTable
也没有通用版本。HashSet
包含类型 T 的值,其中HashTable
(或Dictionary
)包含键值对。因此,您应该根据需要存储的数据来选择集合。These are different data structures. Also there is no generic version of
HashTable
.HashSet
contains values of type T whichHashTable
(orDictionary
) contains key-value pairs. So you should choose collection on what data you need to be stored.