哈希表的空间复杂度是多少?

发布于 2024-11-17 17:22:15 字数 145 浏览 1 评论 0原文

具有 32 位键和指向单独存储的值的 32 位指针的哈希表的大小是多少?

是 2^32 个槽 * (4 字节 (key) + 4 字节 (指向值的指针)) = 4 * 10^9 * (4 + 4) = 32GB ?

我想了解哈希表的空间复杂度。

What is size of a hash table with 32 bit key and 32 bit pointers to values stored separately?

Is it going to be 2^32 slots * (4 Bytes (key) + 4 Bytes (pointers to values))
= 4 * 10^9 * (4 + 4) = 32GB ?

I am trying to understand space complexity of hash tables.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

怪我入戏太深 2024-11-24 17:22:15

我认为你问错了问题。数据结构的空间复杂度表示它占用的空间与其所容纳的元素数量相关。例如,O(1) 的空间复杂度意味着无论您在其中放入多少元素,数据结构始终消耗恒定空间。 O(n) 意味着空间消耗随着其中元素的数量线性增长。

哈希表的空间复杂度通常为O(n)

因此,回答您的问题:这取决于它当前存储的元素数量,以及在现实世界中的实际实现。

哈希表的内存消耗下限是:(要存储的值的数量)*(值的大小)。因此,如果要在哈希表中存储 100 万个值,每个值占用 4 个字节,那么它将至少消耗 400 万个字节(大约 4MB)。通常,现实世界的实现会为基础设施使用更多的内存,但同样:这很大程度上取决于实际的实现,除了测量之外没有办法确定。

I think you are asking the wrong question. The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of O(1) would mean that the datastructure alway consumes constant space no matter how many elements you put in there. O(n) would mean that the space consumption grows linearly with the amount of elements in it.

A hashtable typically has a space complexity of O(n).

So to answer your question: It depends on the number of elements it currently stores and in real world also on the actual implementation.

A lower bound for the memory consumption of your hashtable is: (Number of Values to Store) * (SizeOf a Value). So if you want to store 1 million values in the hashtable and each occupies 4 bytes then it will consume at least 4 million bytes (roughly 4MB). Usually real world implementations use a bit more memory for infrastructure but again: this highly depends on the actual implementation and there is no way to find out for sure but to measure it.

北凤男飞 2024-11-24 17:22:15

哈希表与哈希函数值和槽不匹配。哈希函数是以比哈希函数范围小得多的参考向量的大小为模来计算的。由于该值是固定的,因此在空间复杂度计算中不考虑它。

因此,每个合理哈希表的空间复杂度都是O(n)。

一般来说,这效果很好。虽然键空间可能很大,但要存储的值的数量通常很容易预测。当然,数据结构开销在功能上可接受的内存量通常是显而易见的。

这就是哈希表如此普遍的原因。它们通常为给定任务提供最佳数据结构,将严格限制的内存开销与优于 log2 n 的时间复杂度混合在一起。我喜欢二叉树,但它们通常不会打败哈希表。

Hash tables don't match hash function values and slots. The hash function is computed modulo the size of a reference vector that is much smaller than the hash function range. Because this value is fixed, it is not considered in the space complexity computation.

Consequently, the space complexity of every reasonable hash table is O(n).

In general, this works out quite well. While the key space may be large, the number of values to store is usually quite easily predictable. Certainly, the amount of memory that is functionally acceptable for data structure overhead is typically obvious.

This is why hash tables are so ubiquitous. They often provide the best data structure for a given task, mixing strictly bounded memory overhead with better than log2 n time complexity. I love binary trees but they don't usually beat hash tables.

许你一世情深 2024-11-24 17:22:15

假设我们有一个简单的哈希表,其中存储桶的数量等于元素大小的两倍。即 O(2n) 元素数量为 O(n)。

当元素数量超过可用存储桶数量的一半时,您需要创建一个新的存储桶数组,将大小加倍,并将所有元素重新哈希到新存储桶数组中的新位置。

386  public V put(K key, V value) {
387      if (key == null)
388          return putForNullKey(value);
389      int hash = hash(key.hashCode());
390      int i = indexFor(hash, table.length);
391      for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392          Object k;
393          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394              V oldValue = e.value;
395              e.value = value;
396              e.recordAccess(this);
397              return oldValue;
398          }
399      }
401      modCount++;
402      addEntry(hash, key, value, i);
403      return null;
404  }

768  void addEntry(int hash, K key, V value, int bucketIndex) {
769      Entry<K,V> e = table[bucketIndex];
770      table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
771      if (size++ >= threshold)
772          resize(2 * table.length);
773  }

471  void resize(int newCapacity) {
472      Entry[] oldTable = table;
473      int oldCapacity = oldTable.length;
474      if (oldCapacity == MAXIMUM_CAPACITY) {
475          threshold = Integer.MAX_VALUE;
476          return;
477      }
479      Entry[] newTable = new Entry[newCapacity];
480      transfer(newTable);
481      table = newTable;
482      threshold = (int)(newCapacity * loadFactor);
483  }

488  void transfer(Entry[] newTable) {
489      Entry[] src = table;
490      int newCapacity = newTable.length;
491      for (int j = 0; j < src.length; j++) {
492          Entry<K,V> e = src[j];
493          if (e != null) {
494              src[j] = null;
495              do {
496                  Entry<K,V> next = e.next;
497                  int i = indexFor(e.hash, newCapacity);
498                  e.next = newTable[i];
499                  newTable[i] = e;
500                  e = next;
501              } while (e != null);
502          }
503      }
504  }

参考资料:

HashMap.put
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.put%28java.lang.Object%2Cjava .lang.Object%29

Grepcode 已关闭,您可以在此处查看 openjdk 存储库作为更好的参考:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

Lets pretend we have a naive hashtable where the number of buckets is equal to double the size of the elements. That is O(2n) the number of elements which is O(n).

When the number of elements exceeds half of the number of available buckets, you need to create a new array of buckets, double the size and rehash all the elements to their new locations in the new array of buckets.

386  public V put(K key, V value) {
387      if (key == null)
388          return putForNullKey(value);
389      int hash = hash(key.hashCode());
390      int i = indexFor(hash, table.length);
391      for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392          Object k;
393          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394              V oldValue = e.value;
395              e.value = value;
396              e.recordAccess(this);
397              return oldValue;
398          }
399      }
401      modCount++;
402      addEntry(hash, key, value, i);
403      return null;
404  }

768  void addEntry(int hash, K key, V value, int bucketIndex) {
769      Entry<K,V> e = table[bucketIndex];
770      table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
771      if (size++ >= threshold)
772          resize(2 * table.length);
773  }

471  void resize(int newCapacity) {
472      Entry[] oldTable = table;
473      int oldCapacity = oldTable.length;
474      if (oldCapacity == MAXIMUM_CAPACITY) {
475          threshold = Integer.MAX_VALUE;
476          return;
477      }
479      Entry[] newTable = new Entry[newCapacity];
480      transfer(newTable);
481      table = newTable;
482      threshold = (int)(newCapacity * loadFactor);
483  }

488  void transfer(Entry[] newTable) {
489      Entry[] src = table;
490      int newCapacity = newTable.length;
491      for (int j = 0; j < src.length; j++) {
492          Entry<K,V> e = src[j];
493          if (e != null) {
494              src[j] = null;
495              do {
496                  Entry<K,V> next = e.next;
497                  int i = indexFor(e.hash, newCapacity);
498                  e.next = newTable[i];
499                  newTable[i] = e;
500                  e = next;
501              } while (e != null);
502          }
503      }
504  }

References:

HashMap.put
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.put%28java.lang.Object%2Cjava.lang.Object%29

Grepcode is down, you can take a look the openjdk repo here as a better reference:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

谜兔 2024-11-24 17:22:15

这个问题仍然没有完美的答案。我不确定占用的空间。
根据我对这个问题的理解。大小是动态的并且随着输入的大小而变化。

也就是说,我们从一个随机数开始,即哈希表大小,与哈希函数值相比要小得多。然后我们插入输入。现在,当冲突开始发生时,我们动态地将哈希表大小加倍。
我认为这就是 O(n) 复杂度的原因。如果我错了,请纠正我。

Still there is no perfect answer to the question. I am not sure about the space occupied.
As per my understanding of the issue. The size is dynamic and varies with the size of input.

That is we start with a random number, hash table size, which is very less as compare to hash function value. Then we insert the input. Now, as the collision start occurring we dynamically double the hash table size.
This is the reason, I think, for O(n) complexity. Kindly correct me if I am wrong.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文