如何使用 c++ 将字符串哈希为 int?
我必须编写自己的哈希函数。如果我只想创建一个简单的哈希函数,将字符串中的每个字母映射到一个数值(即 a=1,b=2,c=3,...),有没有办法可以执行此哈希一个字符串而不必首先将其转换为 C 字符串来查看每个单独的字符?有没有更有效的散列字符串的方法?
I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
从个人经验来看,我知道这很有效并且可以产生良好的发行版。 (抄袭自http://www.cse.yorku.ca/~oz/hash.html ):
djb2
这个算法 (k=33) 是由 dan bernstein 多年前在 comp.lang.c 中首次报道的。该算法的另一个版本(现在受到 Bernstein 的青睐)使用异或: hash(i) = hash(i - 1) * 33 ^ str[i];数字 33 的魔力(为什么它比许多其他常量(无论是素数还是非素数)效果更好)从未得到充分解释。
From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):
djb2
this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
关于第一个问题,当然,例如:
关于第二个问题,有很多更好的方法来散列字符串。例如,请参阅此处获取一些 C 示例(可以轻松翻译为 C++上面代码片段的行)。
Re the first question, sure, e.g, something like:
regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).
您可以使用
[]
运算符检查 std::string 中的每个单独的字符。但是,您可以查看 Boost::Functional/Hash 获取更好的散列方案的指导。 此处还提供了 c 中的哈希函数列表。You can examine each individual char from a std::string using the
[]
operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.这是我在 Stroustrup 的书中找到的一个 C (++) 哈希函数:
如果您将它用于哈希表(Stroustrup 就是这样做的),那么您可以返回哈希模素数的绝对值。所以改为
最后一行。
Here's a C (++) hash function that I found in Stroustrup's book:
If you're using it for a hash table (which Stroustrup does) then you can instead return the abs of the hash modulo a prime number. So instead
for the last line.
C++11 附带了一个标准的字符串哈希函数。
https://en.cppreference.com/w/cpp/string/basic_string/哈希
C++11 ships with a standard hashing function for strings.
https://en.cppreference.com/w/cpp/string/basic_string/hash
只是发布对 Arnestig 的 djb2 算法的改进,使其对 constexpr 友好。
我必须删除参数的无符号限定符,以便它可以使用文字字符串。
Just posting an improvement to Arnestig's djb2 algorithm to be constexpr-friendly.
I had to remove the unsigned qualifier of the argument so it can work with literal strings.
将字符异或在一起,一次四个。
xor the characters together, four at a time.
小字符串的另一种方法:
Another way for small strings:
您可以使用成员函数 operator[]或字符串类或迭代器的 at 来访问字符串的各个字符对象而不将其转换为 C 风格的 char 数组。
要将字符串对象哈希为整数,您必须访问字符串对象的每个单独的字符,您可以执行以下操作:
You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.
To hash a string object to an integer you'll have to access each individual char of the string object which you can do as: