如何为整数或字符串哈希选择模?
通常,我们通过根据规则计算integer
或string
来进行哈希,然后返回hash(int-or-str) % m
为哈希表中的索引,但是我们如何选择模 m
呢?有什么约定可以遵循吗?
Typically, we do hashing by calculating the integer
or string
according to a rule, then return hash(int-or-str) % m
as the index in the hash table, but how do we choose the modulo m
? Is there any convention to follow?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有两种可能的约定。一种是使用素数,它可以产生二次探测的良好性能。
另一种是使用 2 的幂,因为 n mod m 其中 m = 2^k 是运行速度快;它是与m-1 的按位与。当然,模数必须等于哈希表的大小,并且二的幂意味着当哈希表过度拥挤时,哈希表的大小必须加倍。这为您提供了摊销 O(1) 插入,其方式与动态数组类似做。
There are two possible conventions. One is to use a prime number, which yields good performance with quadratic probing.
The other is to use a power of two, since n mod m where m = 2^k is a fast operation; it's a bitwise AND with m-1. Of course, the modulus must be equal to the size of the hash table, and powers of two mean your hash table must double in size whenever it's overcrowded. This gives you amortized O(1) insertion in a similar way that a dynamic array does.
由于 [val modulo m] 用作表的索引,因此 m 是该表中的元素数量。你可以自由选择吗?然后使用足够大的素数。如果您需要调整表的大小,您可以选择使用更大的素数,或者(如果您选择将表加倍来调整大小)您最好确保您的哈希函数在较低位中有足够的熵。
Since [val modulo m] is used as an index into the table, m is the number of elements in that table. Are you free to choose that ? Then use a big enough prime number. If you need to resize the table, you can either chose to use a bigger prime number, or (if you choose doubling the table for resizing) you'd better make sure that your hash function has enough entropy in the lower bits.