OpenJDK的重新哈希机制

发布于 2024-12-12 04:42:41 字数 762 浏览 0 评论 0原文

http://www.docjar.com/html 上找到此代码/api/java/util/HashMap.java.html 搜索 HashMap 实现后。

  264       static int hash(int h) {
  265           // This function ensures that hashCodes that differ only by
  266           // constant multiples at each bit position have a bounded
  267           // number of collisions (approximately 8 at default load factor).
  268           h ^= (h >>> 20) ^ (h >>> 12);
  269           return h ^ (h >>> 7) ^ (h >>> 4);
  270       }

有人可以解释一下吗?注释告诉我们为什么这段代码在这里,但我想了解这如何改善错误的哈希值以及它如何保证位置的数量有限碰撞。这些神奇数字意味着什么?

Found this code on http://www.docjar.com/html/api/java/util/HashMap.java.html after searching for a HashMap implementation.

  264       static int hash(int h) {
  265           // This function ensures that hashCodes that differ only by
  266           // constant multiples at each bit position have a bounded
  267           // number of collisions (approximately 8 at default load factor).
  268           h ^= (h >>> 20) ^ (h >>> 12);
  269           return h ^ (h >>> 7) ^ (h >>> 4);
  270       }

Can someone shed some light on this? The comment tells us why this code is here but I would like to understand how this improves a bad hash value and how it guarantees that the positions have bounded number of collisions. What do these magic numbers mean?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

所谓喜欢 2024-12-19 04:42:41

为了使其有意义,必须结合对 HashMap 如何将事物分配到存储桶的理解相结合。这是选择存储桶索引的简单函数:

static int indexFor(int h, int length) {
    return h & (length-1);
}

因此您可以看到,在默认表大小为 16 的情况下,只有哈希的 4 个最低有效位实际上对于分配存储桶很重要! (16 - 1 = 15,用 1111b 掩盖散列)

如果您的 hashCode 函数返回,这显然是个坏消息:

10101100110101010101111010111111

01111100010111011001111010111111

11000000010100000001111010111111
//etc etc etc

这样的散列函数在其作者可见的任何方面都不太可能是“坏”的。但如果你把它与地图分配桶的方式结合起来,繁荣,MapFail(tm)。

如果您记住 h 是一个 32 位数字,那么这些根本不是魔法数字。它系统地将数字的最高有效位向右异或为最低有效位。目的是当以二进制形式查看时,在“跨”任何地方出现的数字“差异”在最低有效位中变得可见。

冲突变得有界,因为具有相同相关 LSB 的不同数字的数量现在明显受到限制,因为二进制表示中任何地方出现的任何差异都被压缩为对分桶重要的位。

In order for it to make any sense it has to be combined with an understanding of how HashMap allocates things in to buckets. This is the trivial function by which a bucket index is chosen:

static int indexFor(int h, int length) {
    return h & (length-1);
}

So you can see, that with a default table size of 16, only the 4 least significant bits of the hash actually matter for allocating buckets! (16 - 1 = 15, which masks the hash by 1111b)

This could clearly be bad news if your hashCode function returned:

10101100110101010101111010111111

01111100010111011001111010111111

11000000010100000001111010111111
//etc etc etc

Such a hash function would not likely be "bad" in any way that is visible to its author. But if you combine it with the way the map allocates buckets, boom, MapFail(tm).

If you keep in mind that h is a 32 bit number, those are not magic numbers at all. It is systematically xoring the most significant bits of the number rightward into the least significant bits. The purpose is so that "differences" in the number that occur anywhere "across" it when viewed in binary become visible down in the least significant bits.

Collisions become bounded because the number of different numbers that have the same relevant LSBs is now significantly bounded because any differences that occur anywhere in the binary representation are compressed into the bits that matter for bucket-ing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文