哈希表运行时复杂性(插入、搜索和删除)
为什么我总是在哈希表上看到这些函数的不同运行时复杂性?
在 wiki 上,搜索和删除都是 O(n) (我认为哈希表的要点是不断查找,所以如果搜索是 O(n) 又有什么意义呢)。
在不久前的一些课程笔记中,我看到了一系列取决于某些细节的复杂性,其中包括一个 O(1) 的复杂性。如果我可以获得所有 O(1),为什么还要使用任何其他实现?
如果我在 C++ 或 Java 等语言中使用标准哈希表,预计时间复杂度是多少?
Why do I keep seeing different runtime complexities for these functions on a hash table?
On wiki, search and delete are O(n) (I thought the point of hash tables was to have constant lookup so what's the point if search is O(n)).
In some course notes from a while ago, I see a wide range of complexities depending on certain details including one with all O(1). Why would any other implementation be used if I can get all O(1)?
If I'm using standard hash tables in a language like C++ or Java, what can I expect the time complexity to be?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
哈希表是
O(1)
平均和摊销情况复杂性,但会遭受O(n)
最坏情况时间复杂。 [我认为这就是您困惑的地方]哈希表由于两个原因而遭受
O(n)
最坏的时间复杂度:O(n)
时间。然而,据说这是
O(1)
平均且摊销的情况,因为:O(n)
,最多可以在n/2
次操作之后发生,这些操作均假定为O(1)
:因此,当您对每个操作的平均时间求和时,您会得到:(n*O(1) + O(n)) / n) = O(1)
请注意,由于重新哈希问题 - a实时应用程序和需要低延迟 - 不应使用哈希表作为其数据结构。
编辑:哈希表的另一个问题:缓存
您可能会在大型哈希表中看到性能损失的另一个问题是由于缓存性能造成的。 哈希表的缓存性能较差,因此对于大型集合 - 访问时间可能会更长,因为您需要将表的相关部分从内存重新加载回缓存。
Hash tables are
O(1)
average and amortized case complexity, however it suffers fromO(n)
worst case time complexity. [And I think this is where your confusion is]Hash tables suffer from
O(n)
worst time complexity due to two reasons:O(n)
time.However, it is said to be
O(1)
average and amortized case because:O(n)
, can at most happen aftern/2
ops, which are all assumedO(1)
: Thus when you sum the average time per op, you get :(n*O(1) + O(n)) / n) = O(1)
Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.
EDIT: Annother issue with hash tables: cache
Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.
理想情况下,哈希表的复杂度是
O(1)
。问题是如果两个键不相等,但它们会产生相同的哈希值。例如,假设字符串“it was the best of times it was the bad of times” 和 “Green Eggs and Ham” 都产生了哈希值
123
。当插入第一个字符串时,它被放入存储桶 123 中。当插入第二个字符串时,它会看到存储桶
123
已经存在一个值。然后它将新值与现有值进行比较,并发现它们不相等。在这种情况下,将为该键创建一个数组或链表。此时,检索该值的时间复杂度为O(n)
,因为哈希表需要迭代该存储桶中的每个值以找到所需的值。因此,在使用哈希表时,使用具有非常好的哈希函数的键非常重要,该函数既快速又不会经常导致不同对象出现重复值。
有道理吗?
Ideally, a hashtable is
O(1)
. The problem is if two keys are not equal, however they result in the same hash.For example, imagine the strings "it was the best of times it was the worst of times" and "Green Eggs and Ham" both resulted in a hash value of
123
.When the first string is inserted, it's put in bucket 123. When the second string is inserted, it would see that a value already exists for bucket
123
. It would then compare the new value to the existing value, and see they are not equal. In this case, an array or linked list is created for that key. At this point, retrieving this value becomesO(n)
as the hashtable needs to iterate through each value in that bucket to find the desired one.For this reason, when using a hash table, it's important to use a key with a really good hash function that's both fast and doesn't often result in duplicate values for different objects.
Make sense?
一些哈希表(cuckoo 哈希)保证了 O(1) 查找
Some hash tables (cuckoo hashing) have guaranteed O(1) lookup
也许您正在考虑空间复杂度?即 O(n)。其他复杂性与哈希表条目中的预期相同。随着桶数量的增加,搜索复杂度接近 O(1)。如果在最坏的情况下哈希表中只有一个桶,那么搜索复杂度为 O(n)。
根据评论进行编辑 我认为 O(1) 是平均情况是不正确的。它确实是(正如维基百科页面所说)O(1+n/k),其中 K 是哈希表大小。如果 K 足够大,则结果实际上是 O(1)。但是假设K是10,N是100,那么每个桶平均有10个条目,所以搜索时间肯定不是O(1);它是最多 10 个条目的线性搜索。
Perhaps you were looking at the space complexity? That is O(n). The other complexities are as expected on the hash table entry. The search complexity approaches O(1) as the number of buckets increases. If at the worst case you have only one bucket in the hash table, then the search complexity is O(n).
Edit in response to comment I don't think it is correct to say O(1) is the average case. It really is (as the wikipedia page says) O(1+n/k) where K is the hash table size. If K is large enough, then the result is effectively O(1). But suppose K is 10 and N is 100. In that case each bucket will have on average 10 entries, so the search time is definitely not O(1); it is a linear search through up to 10 entries.
取决于你如何实现哈希,在最坏的情况下它可以达到 O(n),在最好的情况下它是 0(1) (通常如果你的 DS 不是那么大,你可以轻松实现)
Depends on the how you implement hashing, in the worst case it can go to O(n), in best case it is 0(1) (generally you can achieve if your DS is not that big easily)