HashMap 获取/放置复杂性

发布于 2024-10-09 21:34:41 字数 282 浏览 10 评论 0 原文

我们习惯说 HashMap get/put 操作的复杂度是 O(1)。然而,这取决于哈希实现。默认的对象哈希实际上是 JVM 堆中的内部地址。我们确定声称 get/put 的复杂度是 O(1) 就足够了吗?

可用内存是另一个问题。据我从 javadocs 了解到,HashMap 负载因子应该是 0.75。如果 JVM 内存不足并且负载因子超过限制怎么办?

所以,看起来 O(1) 无法保证。这有意义还是我错过了什么?

We are used to saying that HashMap get/put operations are O(1). However it depends on the hash implementation. The default object hash is actually the internal address in the JVM heap. Are we sure it is good enough to claim that the get/put are O(1)?

Available memory is another issue. As I understand from the javadocs, the HashMap load factor should be 0.75. What if we do not have enough memory in JVM and the load factor exceeds the limit?

So, it looks like O(1) is not guaranteed. Does it make sense or am I missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

万劫不复 2024-10-16 21:34:41

这取决于很多事情。 通常 O(1),具有一个不错的散列,它本身是恒定时间...但是你可能有一个需要很长时间才能计算的散列,并且如果有哈希映射中返回相同哈希码的多个项目,get 必须迭代它们,对每个项目调用 equals 才能找到匹配项。

在最坏的情况下,由于遍历同一哈希桶中的所有条目(例如,如果它们都具有相同的哈希码),HashMap 的查找时间为 O(n)。幸运的是,根据我的经验,最坏的情况在现实生活中并不经常出现。所以不,当然不能保证 O(1) - 但这通常是您在考虑使用哪些算法和数据结构时应该假设的。

在 JDK 8 中,HashMap 已进行了调整,以便如果可以比较键进行排序,则任何密集填充的存储桶都将实现为树,因此即使存在大量具有相同哈希的条目代码,复杂度为O(log n)。当然,如果您的键类型的相等性和顺序不同,这可能会导致问题。

是的,如果你没有足够的内存来存储哈希映射,你就会遇到麻烦......但无论你使用什么数据结构,这都是事实。

It depends on many things. It's usually O(1), with a decent hash which itself is constant time... but you could have a hash which takes a long time to compute, and if there are multiple items in the hash map which return the same hash code, get will have to iterate over them calling equals on each of them to find a match.

In the worst case, a HashMap has an O(n) lookup due to walking through all entries in the same hash bucket (e.g. if they all have the same hash code). Fortunately, that worst case scenario doesn't come up very often in real life, in my experience. So no, O(1) certainly isn't guaranteed - but it's usually what you should assume when considering which algorithms and data structures to use.

In JDK 8, HashMap has been tweaked so that if keys can be compared for ordering, then any densely-populated bucket is implemented as a tree, so that even if there are lots of entries with the same hash code, the complexity is O(log n). That can cause issues if you have a key type where equality and ordering are different, of course.

And yes, if you don't have enough memory for the hash map, you'll be in trouble... but that's going to be true whatever data structure you use.

浅蓝的眸勾画不出的柔情 2024-10-16 21:34:41

已经提到过,如果 n 是项目数量,m 是大小,则哈希图平均为 O(n/m) 。也有人提到,原则上整个事情可以折叠成一个单链表,查询时间为 O(n) 。 (这一切都假设计算哈希值是恒定时间)。

然而,不常被提及的是,至少有 1-1/n 的概率(因此对于 1000 个项目,有 99.9% 的机会),最大的桶不会被填充超过 O(logn)!因此匹配二叉搜索树的平均复杂度。 (这个常数很好,更严格的界限是(log n)*(m/n) + O(1))。

此理论界限所需的只是您使用相当好的哈希函数(请参阅维基百科:通用哈希. 它可以像 a*x>>m 一样简单)。当然,为您提供哈希值的人不知道您如何选择随机常量。

TL;DR:在非常高的概率下,哈希图的最坏情况获取/放置复杂度为 O(logn)

It has already been mentioned that hashmaps are O(n/m) in average, if n is the number of items and m is the size. It has also been mentioned that in principle the whole thing could collapse into a singly linked list with O(n) query time. (This all assumes that calculating the hash is constant time).

However what isn't often mentioned is, that with probability at least 1-1/n (so for 1000 items that's a 99.9% chance) the largest bucket won't be filled more than O(logn)! Hence matching the average complexity of binary search trees. (And the constant is good, a tighter bound is (log n)*(m/n) + O(1)).

All that's required for this theoretical bound is that you use a reasonably good hash function (see Wikipedia: Universal Hashing. It can be as simple as a*x>>m). And of course that the person giving you the values to hash doesn't know how you have chosen your random constants.

TL;DR: With Very High Probability the worst case get/put complexity of a hashmap is O(logn).

铃予 2024-10-16 21:34:41

我同意:

  • 一般摊销复杂度为 O(1)
  • 糟糕的 hashCode() 实现可能会导致多次冲突,这意味着在最坏的情况下每个对象都会进入同一个存储桶,因此 O( N),如果每个存储桶都由 List 支持。
  • 从 Java 8 开始,HashMap 动态地将每个存储桶中使用的节点(链表)替换为 TreeNode(当列表大于 8 个元素时的红黑树),导致最差的性能 O(logN)。

但是,如果我们想要 100% 精确,这并不是全部事实。严格来说,hashCode() 的实现和键Object 的类型(不可变/缓存或集合)也可能会影响实时复杂性。

假设有以下三种情况:

  1. HashMap
  2. HashMap
  3. HashMap, V>

它们具有相同的复杂性吗?嗯,第一个的摊余复杂度正如预期的那样是 O(1)。但是,对于其余的,我们还需要计算查找元素的 hashCode() ,这意味着我们可能必须在算法中遍历数组和列表。

假设上述所有数组/列表的大小为k
然后,HashMapHashMap, V> 将具有 O(k) 摊销复杂度,类似地,O(k + logN) Java8 中最坏的情况。

*请注意,使用String键是一种更复杂的情况,因为它是不可变的,并且Java将hashCode()的结果缓存在私有变量hash,所以它只计算一次。

/** Cache the hash code for the string */
    private int hash; // Default to 0

但是,上面也有其最坏的情况,因为 Java 的 String.hashCode() 实现在计算 hashCode 之前检查是否 hash == 0 >。但是,嘿,有一些非空字符串输出 hashcode 为零,例如“f5a5a608”,请参阅此处,在这种情况下,记忆可能没有帮助。

I agree with:

  • the general amortized complexity of O(1)
  • a bad hashCode() implementation could result to multiple collisions, which means that in the worst case every object goes to the same bucket, thus O(N) if each bucket is backed by a List.
  • since Java 8, HashMap dynamically replaces the Nodes (linked list) used in each bucket with TreeNodes (red-black tree when a list gets bigger than 8 elements) resulting to a worst performance of O(logN).

But, this is not the full truth if we want to be 100% precise. The implementation of hashCode() and the type of key Object (immutable/cached or being a Collection) might also affect real time complexity in strict terms.

Let's assume the following three cases:

  1. HashMap<Integer, V>
  2. HashMap<String, V>
  3. HashMap<List<E>, V>

Do they have the same complexity? Well, the amortised complexity of the 1st one is, as expected, O(1). But, for the rest, we also need to compute hashCode() of the lookup element, which means we might have to traverse arrays and lists in our algorithm.

Lets assume that the size of all of the above arrays/lists is k.
Then, HashMap<String, V> and HashMap<List<E>, V> will have O(k) amortised complexity and similarly, O(k + logN) worst case in Java8.

*Note that using a String key is a more complex case, because it is immutable and Java caches the result of hashCode() in a private variable hash, so it's only computed once.

/** Cache the hash code for the string */
    private int hash; // Default to 0

But, the above is also having its own worst case, because Java's String.hashCode() implementation is checking if hash == 0 before computing hashCode. But hey, there are non-empty Strings that output a hashcode of zero, such as "f5a5a608", see here, in which case memoization might not be helpful.

暖树树初阳… 2024-10-16 21:34:41

我不确定默认的哈希码是地址 - 我不久前阅读了用于生成哈希码的 OpenJDK 源代码,我记得它有点复杂。也许仍然不能保证良好的发行。然而,这在某种程度上是没有意义的,因为在哈希图中用作键的类很少使用默认的哈希码 - 它们提供自己的实现,这应该很好。

最重要的是,你可能不知道(同样,这是基于阅读源代码 - 它不能保证)是 HashMap 在使用它之前搅拌哈希,将整个单词的熵混合到底部位,这就是它所在的位置除了最大的哈希图之外的所有哈希图都需要。这有助于处理那些本身不执行此操作的哈希值,尽管我想不出您会看到这种情况的任何常见情况。

最后,当表过载时会发生的情况是它退化为一组并行链表——性能变为 O(n)。具体来说,所遍历的链接数量平均为负载系数的一半。

I'm not sure the default hashcode is the address - I read the OpenJDK source for hashcode generation a while ago, and I remember it being something a bit more complicated. Still not something that guarantees a good distribution, perhaps. However, that is to some extent moot, as few classes you'd use as keys in a hashmap use the default hashcode - they supply their own implementations, which ought to be good.

On top of that, what you may not know (again, this is based in reading source - it's not guaranteed) is that HashMap stirs the hash before using it, to mix entropy from throughout the word into the bottom bits, which is where it's needed for all but the hugest hashmaps. That helps deal with hashes that specifically don't do that themselves, although i can't think of any common cases where you'd see that.

Finally, what happens when the table is overloaded is that it degenerates into a set of parallel linked lists - performance becomes O(n). Specifically, the number of links traversed will on average be half the load factor.

遗弃M 2024-10-16 21:34:41

HashMap操作是hashCode实现的依赖因素。对于理想的情况,假设良好的哈希实现为每个对象提供唯一的哈希代码(无哈希冲突),那么最好、最坏和平均情况的情况都是 O(1)。
让我们考虑一个场景,其中 hashCode 的错误实现总是返回 1 或存在哈希冲突的哈希。在这种情况下,时间复杂度将为 O(n)。

现在讨论有关内存的问题的第二部分,那么内存约束将由 JVM 负责。

HashMap operation is dependent factor of hashCode implementation. For the ideal scenario lets say the good hash implementation which provide unique hash code for every object (No hash collision) then the best, worst and average case scenario would be O(1).
Let's consider a scenario where a bad implementation of hashCode always returns 1 or such hash which has hash collision. In this case the time complexity would be O(n).

Now coming to the second part of the question about memory, then yes memory constraint would be taken care by JVM.

凤舞天涯 2024-10-16 21:34:41

实际上,它是 O(1),但这实际上是一个可怕的且在数学上毫无意义的简化。 O() 表示法表示当问题的规模趋于无穷大时算法的行为方式。 Hashmap get/put 的工作方式类似于有限大小的 O(1) 算法。从计算机内存和寻址的角度来看,该限制相当大,但远非无穷大。

当有人说 hashmap get/put 是 O(1) 时,实际上应该说 get/put 所需的时间或多或少是恒定的,并且只要 hashmap 可以是,不依赖于 hashmap 中元素的数量。呈现在实际的计算系统上。如果问题超出了这个大小并且我们需要更大的哈希图,那么一段时间后,随着我们用完可能的可描述的不同元素,描述一个元素的位数肯定也会增加。例如,如果我们使用 hashmap 来存储 32 位数字,后来我们增加问题大小,使得 hashmap 中的元素超过 2^32 位,那么单个元素将用超过 32 位来描述。

描述各个元素所需的位数为 log(N),其中 N 是元素的最大数量,因此 get 和 put 实际上是 O(log N)。

如果你将它与树集进行比较,它是 O(log n) 那么哈希集是 O(long(max(n)) 我们只是觉得这是 O(1),因为在某个实现上 max(n)是固定的,不会改变(我们存储的对象的大小以位为单位),并且计算哈希码的算法很快,

如果在任何数据结构中找到一个元素都是 O(1),我们就会凭空创建信息。拥有 n 个元素的数据结构,我可以以 n 种不同的方式选择一个元素,如果我可以将其编码为 0 位(这就是 O(1) 的含义)。然后我创建了一个无限压缩 ZIP 算法。

In practice, it is O(1), but this actually is a terrible and mathematically non-sense simplification. The O() notation says how the algorithm behaves when the size of the problem tends to infinity. Hashmap get/put works like an O(1) algorithm for a limited size. The limit is fairly large from the computer memory and from the addressing point of view, but far from infinity.

When one says that hashmap get/put is O(1) it should really say that the time needed for the get/put is more or less constant and does not depend on the number of elements in the hashmap so far as the hashmap can be presented on the actual computing system. If the problem goes beyond that size and we need larger hashmaps then, after a while, certainly the number of the bits describing one element will also increase as we run out of the possible describable different elements. For example, if we used a hashmap to store 32bit numbers and later we increase the problem size so that we will have more than 2^32 bit elements in the hashmap, then the individual elements will be described with more than 32bits.

The number of the bits needed to describe the individual elements is log(N), where N is the maximum number of elements, therefore get and put are really O(log N).

If you compare it with a tree set, which is O(log n) then hash set is O(long(max(n)) and we simply feel that this is O(1), because on a certain implementation max(n) is fixed, does not change (the size of the objects we store measured in bits) and the algorithm calculating the hash code is fast.

Finally, if finding an element in any data structure were O(1) we would create information out of thin air. Having a data structure of n element I can select one element in n different way. With that, I can encode log(n) bit information. If I can encode that in zero bit (that is what O(1) means) then I created an infinitely compressing ZIP algorithm.

像你 2024-10-16 21:34:41
Java HashMap time complexity
--------------------------------
get(key) & contains(key) & remove(key)          Best case   Worst case                          
HashMap before Java 8, using LinkedList buckets 1           O(n)
HashMap after Java 8, using LinkedList  buckets 1           O(n)
HashMap after Java 8, using Binary Tree buckets 1           O(log n)

 
put(key, value)                                 Best case   Worst case                          
HashMap before Java 8, using LinkedList buckets 1           1
HashMap after Java 8, using LinkedList  buckets 1           1
HashMap after Java 8, using Binary Tree buckets 1           O(log n)

提示:

  • Java 8之前,HashMap使用LinkedList存储桶

  • Java 8 之后,HashMap > 将根据存储桶大小使用 LinkedList 存储桶或 Binary Tree 存储桶。

    如果(存储桶大小 > TREEIFY_THRESHOLD[8]):

    <块引用>

    treeifyBin:存储桶将是平衡二叉红黑树

    if(存储桶大小 <= UNTREEIFY_THRESHOLD[6]):

    <块引用>

    untreeify:存储桶将是LinkedList(普通模式)

Java HashMap time complexity
--------------------------------
get(key) & contains(key) & remove(key)          Best case   Worst case                          
HashMap before Java 8, using LinkedList buckets 1           O(n)
HashMap after Java 8, using LinkedList  buckets 1           O(n)
HashMap after Java 8, using Binary Tree buckets 1           O(log n)

 
put(key, value)                                 Best case   Worst case                          
HashMap before Java 8, using LinkedList buckets 1           1
HashMap after Java 8, using LinkedList  buckets 1           1
HashMap after Java 8, using Binary Tree buckets 1           O(log n)

Hints:

  • Before Java 8, HashMap use LinkedList buckets

  • After Java 8, HashMap will use either LinkedList buckets or Binary Tree buckets according to the bucket size.

    if(bucket size > TREEIFY_THRESHOLD[8]):

    treeifyBin: The bucket will be a Balanced Binary Red-Black Tree

    if(bucket size <= UNTREEIFY_THRESHOLD[6]):

    untreeify: The bucket will be LinkedList (plain mode)

一个人练习一个人 2024-10-16 21:34:41

简而言之,如果每个桶只包含单个节点,那么时间复杂度将为O(1)。如果存储桶包含多个节点,则时间复杂度将为O(linkedList size)。这总是比 O(n) 更高效。

因此我们可以说 put(K,V) 函数的平均时间复杂度:

nodes(n)/buckets(N) = λ (lambda)

示例:16/16 = 1

时间复杂度为 O(1)

In simple word, If each bucket contain only single node then time complexity will be O(1). If bucket contain more than one node them time complexity will be O(linkedList size). which is always efficient than O(n).

hence we can say on an average case time complexity of put(K,V) function :

nodes(n)/buckets(N) = λ (lambda)

Example : 16/16 = 1

Time complexity will be O(1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文