哈希表运行时复杂性(插入、搜索和删除)

发布于 2025-01-03 16:29:11 字数 238 浏览 1 评论 0原文

为什么我总是在哈希表上看到这些函数的不同运行时复杂性?

在 wiki 上,搜索和删除都是 O(n) (我认为哈希表的要点是不断查找,所以如果搜索是 O(n) 又有什么意义呢)。

在不久前的一些课程笔记中,我看到了一系列取决于某些细节的复杂性,其中包括一个 O(1) 的复杂性。如果我可以获得所有 O(1),为什么还要使用任何其他实现?

如果我在 C++ 或 Java 等语言中使用标准哈希表,预计时间复杂度是多少?

Why do I keep seeing different runtime complexities for these functions on a hash table?

On wiki, search and delete are O(n) (I thought the point of hash tables was to have constant lookup so what's the point if search is O(n)).

In some course notes from a while ago, I see a wide range of complexities depending on certain details including one with all O(1). Why would any other implementation be used if I can get all O(1)?

If I'm using standard hash tables in a language like C++ or Java, what can I expect the time complexity to be?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

陌路终见情 2025-01-10 16:29:11

哈希表O(1) 平均和摊销情况复杂性,但会遭受O(n) 最坏情况时间复杂。 [我认为这就是您困惑的地方]

哈希表由于两个原因而遭受 O(n) 最坏的时间复杂度:

  1. 如果太多元素被散列到同一个键中:查看该键的内部可能会需要 O(n) 时间。
  2. 一旦哈希表通过了其负载平衡 - 它必须重新哈希[创建一个新的更大的表,并且将每个元素重新插入到表中]。

然而,据说这是 O(1) 平均且摊销的情况,因为:

  1. 许多项目将被散列到同一个键的情况非常罕见[如果你选择了一个好的散列函数,但你没有'没有太大的负载平衡。
  2. 重新哈希操作的时间为 O(n),最多可以在 n/2 次操作之后发生,这些操作均假定为 O(1) :因此,当您对每个操作的平均时间求和时,您会得到: (n*O(1) + O(n)) / n) = O(1)

请注意,由于重新哈希问题 - a实时应用程序和需要低延迟 - 不应使用哈希表作为其数据结构。

编辑:哈希表的另一个问题:缓存

您可能会在大型哈希表中看到性能损失的另一个问题是由于缓存性能造成的。 哈希表的缓存性能较差,因此对于大型集合 - 访问时间可能会更长,因为您需要将表的相关部分从内存重新加载回缓存。

Hash tables are O(1) average and amortized case complexity, however it suffers from O(n) worst case time complexity. [And I think this is where your confusion is]

Hash tables suffer from O(n) worst time complexity due to two reasons:

  1. If too many elements were hashed into the same key: looking inside this key may take O(n) time.
  2. Once a hash table has passed its load balance - it has to rehash [create a new bigger table, and re-insert each element to the table].

However, it is said to be O(1) average and amortized case because:

  1. It is very rare that many items will be hashed to the same key [if you chose a good hash function and you don't have too big load balance.
  2. The rehash operation, which is O(n), can at most happen after n/2 ops, which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)

Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.

EDIT: Annother issue with hash tables: cache

Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.

南风几经秋 2025-01-10 16:29:11

理想情况下,哈希表的复杂度是O(1)。问题是如果两个键不相等,但它们会产生相同的哈希值。

例如,假设字符串“it was the best of times it was the bad of times” 和 “Green Eggs and Ham” 都产生了哈希值 123

当插入第一个字符串时,它被放入存储桶 123 中。当插入第二个字符串时,它会看到存储桶 123 已经存在一个值。然后它将新值与现有值进行比较,并发现它们不相等。在这种情况下,将为该键创建一个数组或链表。此时,检索该值的时间复杂度为 O(n),因为哈希表需要迭代该存储桶中的每个值以找到所需的值。

因此,在使用哈希表时,使用具有非常好的哈希函数的键非常重要,该函数既快速又不会经常导致不同对象出现重复值。

有道理吗?

Ideally, a hashtable is O(1). The problem is if two keys are not equal, however they result in the same hash.

For example, imagine the strings "it was the best of times it was the worst of times" and "Green Eggs and Ham" both resulted in a hash value of 123.

When the first string is inserted, it's put in bucket 123. When the second string is inserted, it would see that a value already exists for bucket 123. It would then compare the new value to the existing value, and see they are not equal. In this case, an array or linked list is created for that key. At this point, retrieving this value becomes O(n) as the hashtable needs to iterate through each value in that bucket to find the desired one.

For this reason, when using a hash table, it's important to use a key with a really good hash function that's both fast and doesn't often result in duplicate values for different objects.

Make sense?

暖阳 2025-01-10 16:29:11

一些哈希表(cuckoo 哈希)保证了 O(1) 查找

Some hash tables (cuckoo hashing) have guaranteed O(1) lookup

一袭白衣梦中忆 2025-01-10 16:29:11

也许您正在考虑空间复杂度?即 O(n)。其他复杂性与哈希表条目中的预期相同。随着桶数量的增加,搜索复杂度接近 O(1)。如果在最坏的情况下哈希表中只有一个桶,那么搜索复杂度为 O(n)。

根据评论进行编辑 我认为 O(1) 是平均情况是不正确的。它确实是(正如维基百科页面所说)O(1+n/k),其中 K 是哈希表大小。如果 K 足够大,则结果实际上是 O(1)。但是假设K是10,N是100,那么每个桶平均有10个条目,所以搜索时间肯定不是O(1);它是最多 10 个条目的线性搜索。

Perhaps you were looking at the space complexity? That is O(n). The other complexities are as expected on the hash table entry. The search complexity approaches O(1) as the number of buckets increases. If at the worst case you have only one bucket in the hash table, then the search complexity is O(n).

Edit in response to comment I don't think it is correct to say O(1) is the average case. It really is (as the wikipedia page says) O(1+n/k) where K is the hash table size. If K is large enough, then the result is effectively O(1). But suppose K is 10 and N is 100. In that case each bucket will have on average 10 entries, so the search time is definitely not O(1); it is a linear search through up to 10 entries.

倒数 2025-01-10 16:29:11

取决于你如何实现哈希,在最坏的情况下它可以达到 O(n),在最好的情况下它是 0(1) (通常如果你的 DS 不是那么大,你可以轻松实现)

Depends on the how you implement hashing, in the worst case it can go to O(n), in best case it is 0(1) (generally you can achieve if your DS is not that big easily)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文