哈希表就是这么快
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]。是java字符串的哈希函数,我假设其余语言与此实现类似或接近。
如果我们有哈希表和一个包含 50 个元素的列表。每个元素是 7 个字符 ABCDEF1、ABCDEF2、ABCDEF3..... ABCDEFn
如果哈希表的每个桶包含 5 个字符串(我认为这个函数将使每个桶一个字符串,但我们假设它是 5)。
如果我们调用 col.Contains("ABCDEFn"); // 将进行 6 次比较,并在第 7 次发现差异。
哈希表将需要大约 70 次操作(乘法和加法)来获取哈希码并与存储桶中的 5 个字符串进行比较。发现了。
对于列表,需要大约 300 次比较才能找到它。
对于只有 10 个元素的情况,列表将需要大约 70 次操作,而哈希表将需要大约 50 次操作。并注意哈希表操作更耗时(它是乘法)。
我的结论是,对于大多数需要未知大小的哈希表的情况,.Net 中的 HybirdDictionary 可能是最佳选择,因为它允许我使用列表,直到列表超过 10 个元素。仍然需要像 HashSet 这样的东西,而不是键和值的字典,我想知道为什么没有 HybirdSet!
那你觉得怎么样?
谢谢
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]. Is the hash function of the java string, I assume the rest of languages is similar or close to this implementation.
If we have hash-Table and a list of 50 elements. each element is 7 chars ABCDEF1, ABCDEF2, ABCDEF3..... ABCDEFn
If each bucket of hashtable contains 5 strings (I think this function will make it one string per bucket, but let us assume it is 5).
If we call col.Contains("ABCDEFn"); // will do 6 comparisons and discover the difference on the 7th.
The hash-table will take around 70 operations (multiplication and additions) to get the hashcode and to compare with 5 strings in bucket. and BANG it found.
For list it will take around 300 comparisons to find it.
for the case that there is only 10 elements, the list will take around 70 operations but the Hashtable will take around 50 operations. and note that hashtable operations are more time consuming (it is multiplications).
I conclude that HybirdDictionary in .Net probably is the best choice for that most cases that require Hashtable with unknown size, because it will let me use a list till the list becomes more than 10 elements. still need something like HashSet rather than a Dictionary of keys and values, I wonder why there is no HybirdSet!!
So what do u think?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这真的重要吗?您通常关心大量数据集合对性能的影响。如果集合很小,20-30 次额外操作不会产生任何影响。
Does this really matter? You usually care about performance impact with large collections of data. 20-30 additional operations if collection is small wont make any difference.
我认为你提出了一个很好的观点。对于小数字,列表可能比哈希表更快,这在文献中得到了完美的记录。
但是,您可以轻松创建自己的数据结构,根据其大小,
count()
将使用列表或哈希。I think you raise a good point. Lists may be quicker than Hash tables for small numbers, and this is perfectly documented within the literature.
However, you can easily create your own data structure, which according to its size
count()
will utilize a List or a Hash.