OpenJDK的重新哈希机制
在 http://www.docjar.com/html 上找到此代码/api/java/util/HashMap.java.html 搜索 HashMap 实现后。
264 static int hash(int h) {
265 // This function ensures that hashCodes that differ only by
266 // constant multiples at each bit position have a bounded
267 // number of collisions (approximately 8 at default load factor).
268 h ^= (h >>> 20) ^ (h >>> 12);
269 return h ^ (h >>> 7) ^ (h >>> 4);
270 }
有人可以解释一下吗?注释告诉我们为什么这段代码在这里,但我想了解这如何改善错误的哈希值以及它如何保证位置的数量有限碰撞。这些神奇数字意味着什么?
Found this code on http://www.docjar.com/html/api/java/util/HashMap.java.html after searching for a HashMap implementation.
264 static int hash(int h) {
265 // This function ensures that hashCodes that differ only by
266 // constant multiples at each bit position have a bounded
267 // number of collisions (approximately 8 at default load factor).
268 h ^= (h >>> 20) ^ (h >>> 12);
269 return h ^ (h >>> 7) ^ (h >>> 4);
270 }
Can someone shed some light on this? The comment tells us why this code is here but I would like to understand how this improves a bad hash value and how it guarantees that the positions have bounded number of collisions. What do these magic numbers mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了使其有意义,必须结合对 HashMap 如何将事物分配到存储桶的理解相结合。这是选择存储桶索引的简单函数:
因此您可以看到,在默认表大小为 16 的情况下,只有哈希的 4 个最低有效位实际上对于分配存储桶很重要! (16 - 1 = 15,用 1111b 掩盖散列)
如果您的 hashCode 函数返回,这显然是个坏消息:
这样的散列函数在其作者可见的任何方面都不太可能是“坏”的。但如果你把它与地图分配桶的方式结合起来,繁荣,MapFail(tm)。
如果您记住 h 是一个 32 位数字,那么这些根本不是魔法数字。它系统地将数字的最高有效位向右异或为最低有效位。目的是当以二进制形式查看时,在“跨”任何地方出现的数字“差异”在最低有效位中变得可见。
冲突变得有界,因为具有相同相关 LSB 的不同数字的数量现在明显受到限制,因为二进制表示中任何地方出现的任何差异都被压缩为对分桶重要的位。
In order for it to make any sense it has to be combined with an understanding of how HashMap allocates things in to buckets. This is the trivial function by which a bucket index is chosen:
So you can see, that with a default table size of 16, only the 4 least significant bits of the hash actually matter for allocating buckets! (16 - 1 = 15, which masks the hash by 1111b)
This could clearly be bad news if your hashCode function returned:
Such a hash function would not likely be "bad" in any way that is visible to its author. But if you combine it with the way the map allocates buckets, boom, MapFail(tm).
If you keep in mind that h is a 32 bit number, those are not magic numbers at all. It is systematically xoring the most significant bits of the number rightward into the least significant bits. The purpose is so that "differences" in the number that occur anywhere "across" it when viewed in binary become visible down in the least significant bits.
Collisions become bounded because the number of different numbers that have the same relevant LSBs is now significantly bounded because any differences that occur anywhere in the binary representation are compressed into the bits that matter for bucket-ing.