好的哈希函数
我一直无法理解哈希函数的设计。我正在经历一个例子。正如您在函数注释中看到的,为什么要选择 31 作为要相乘的数字。你如何决定?这是随机的还是有什么意义?
unsigned int hash(hash_table_t *hashtable, char *str)
{
unsigned int hashval;
/* we start our hash out at 0 */
hashval = 0;
/* for each character, we multiply the old hash by 31 and add the current
* character. Remember that shifting a number left is equivalent to
* multiplying it by 2 raised to the number of places shifted. So we
* are in effect multiplying hashval by 32 and then subtracting hashval.
* Why do we do this? Because shifting and subtraction are much more
* efficient operations than multiplication.
*/
for(; *str != '\0'; str++) hashval = *str + (hashval << 5) - hashval;
/* we then return the hash value mod the hashtable size so that it will
* fit into the necessary range
*/
return hashval % hashtable->size;
}
I have been failing to understand the design of hash functions. I was going through an example. As you can see in the function comments, why should you choose 31 as the number to be multiplied. How do you decide? Is that something random or does it mean something?
unsigned int hash(hash_table_t *hashtable, char *str)
{
unsigned int hashval;
/* we start our hash out at 0 */
hashval = 0;
/* for each character, we multiply the old hash by 31 and add the current
* character. Remember that shifting a number left is equivalent to
* multiplying it by 2 raised to the number of places shifted. So we
* are in effect multiplying hashval by 32 and then subtracting hashval.
* Why do we do this? Because shifting and subtraction are much more
* efficient operations than multiplication.
*/
for(; *str != '\0'; str++) hashval = *str + (hashval << 5) - hashval;
/* we then return the hash value mod the hashtable size so that it will
* fit into the necessary range
*/
return hashval % hashtable->size;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所涉及的哈希值称为 Bernstein 哈希值、Torek 哈希值或简称为“times 33”哈希值。由于其简单、速度快以及对英语字符串数据的良好分布,它非常受欢迎。
您的评论指出它实际上是乘以 31,这对您来说似乎是任意的,实际上有点任意。 Apache Portable Runtime 在其哈希算法源中有一条注释< /a> 指出许多可能的常量都可以很好地工作(33 是最常见的)。它们都是奇数且接近 2 的幂,这意味着它们可以很好地转化为移位和加法或减法。
其他一些有助于理解哈希的资源:
The hash in question is known as the Bernstein Hash, Torek Hash, or simply the "times 33" hash. It is pretty popular due to its simplicity, speed, and decent distribution with English string data.
Your comments note it is actually multiplying by 31, which seemed arbitrary to you and actually is a bit arbitrary. The Apache Portable Runtime has a comment in their hash algorithm source which notes that many possible constants work well (33 being the most common). They all are odd and close to a power of 2, meaning they translate well into shifts and addition or subtraction.
Some other resources to help understand hashing:
这是一个关于哈希函数的讲座,浏览量为 65k。在 YouTube 上:http://www.youtube.com/watch?v=KW0UvOW0XIo
并不完全是您所要求的,但是您的问题表明您在哈希方面的知识有限。最好阅读教程或查看演示文稿。
Here is a lecture on hash functions with 65k views. On youtube: http://www.youtube.com/watch?v=KW0UvOW0XIo
This is not exactly what you are asking for, however your questions reveals that your knowledge in hashing is limited. Better to read a tutorial or check a presentation.