如何使用 c++ 将字符串哈希为 int?

发布于 2024-08-26 12:02:53 字数 134 浏览 6 评论 0原文

我必须编写自己的哈希函数。如果我只想创建一个简单的哈希函数,将字符串中的每个字母映射到一个数值(即 a=1,b=2,c=3,...),有没有办法可以执行此哈希一个字符串而不必首先将其转换为 C 字符串来查看每个单独的字符?有没有更有效的散列字符串的方法?

I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

一袭水袖舞倾城 2024-09-02 12:02:53

从个人经验来看,我知道这很有效并且可以产生良好的发行版。 (抄袭自http://www.cse.yorku.ca/~oz/hash.html ):

djb2

这个算法 (k=33) 是由 dan bernstein 多年前在 comp.lang.c 中首次报道的。该算法的另一个版本(现在受到 Bernstein 的青睐)使用异或: hash(i) = hash(i - 1) * 33 ^ str[i];数字 33 的魔力(为什么它比许多其他常量(无论是素数还是非素数)效果更好)从未得到充分解释。

unsigned long hash(unsigned char *str) {
    unsigned long hash = 5381;
    int c;

    while (c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}

From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):

djb2

this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.

unsigned long hash(unsigned char *str) {
    unsigned long hash = 5381;
    int c;

    while (c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}
别靠近我心 2024-09-02 12:02:53

关于第一个问题,当然,例如:

int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
  hash = hash << 1 | (*it - offset);
}

关于第二个问题,有很多更好的方法来散列字符串。例如,请参阅此处获取一些 C 示例(可以轻松翻译为 C++上面代码片段的行)。

Re the first question, sure, e.g, something like:

int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
  hash = hash << 1 | (*it - offset);
}

regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).

爱她像谁 2024-09-02 12:02:53

您可以使用 [] 运算符检查 std::string 中的每个单独的字符。但是,您可以查看 Boost::Functional/Hash 获取更好的散列方案的指导。 此处还提供了 c 中的哈希函数列表。

You can examine each individual char from a std::string using the [] operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.

狂之美人 2024-09-02 12:02:53

这是我在 Stroustrup 的书中找到的一个 C (++) 哈希函数:

int hash(const char *str)
{
    int h = 0;
    while (*str)
       h = h << 1 ^ *str++;
    return h;
}

如果您将它用于哈希表(Stroustrup 就是这样做的),那么您可以返回哈希模素数的绝对值。所以改为

    return (h > 0 ? h : -h) % N_BUCKETS;

最后一行。

Here's a C (++) hash function that I found in Stroustrup's book:

int hash(const char *str)
{
    int h = 0;
    while (*str)
       h = h << 1 ^ *str++;
    return h;
}

If you're using it for a hash table (which Stroustrup does) then you can instead return the abs of the hash modulo a prime number. So instead

    return (h > 0 ? h : -h) % N_BUCKETS;

for the last line.

机场等船 2024-09-02 12:02:53

C++11 附带了一个标准的字符串哈希函数。

https://en.cppreference.com/w/cpp/string/basic_string/哈希

#include <string>
#include<functional> // hash
int main(){
    std::string s = "Hello";
    std::size_t hash = std::hash<std::string>{}(s);
}

C++11 ships with a standard hashing function for strings.

https://en.cppreference.com/w/cpp/string/basic_string/hash

#include <string>
#include<functional> // hash
int main(){
    std::string s = "Hello";
    std::size_t hash = std::hash<std::string>{}(s);
}
蓦然回首 2024-09-02 12:02:53

只是发布对 Arnestig 的 djb2 算法的改进,使其对 constexpr 友好。
我必须删除参数的无符号限定符,以便它可以使用文字字符串。

constexpr unsigned long hash(const char *str) {
    unsigned long hash = 5381;

    while (int c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}

Just posting an improvement to Arnestig's djb2 algorithm to be constexpr-friendly.
I had to remove the unsigned qualifier of the argument so it can work with literal strings.

constexpr unsigned long hash(const char *str) {
    unsigned long hash = 5381;

    while (int c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}
软糖 2024-09-02 12:02:53

将字符异或在一起,一次四个。

xor the characters together, four at a time.

素年丶 2024-09-02 12:02:53
#include <iostream>
#include <string>
#include <algorithm>

using namespace std;

// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
    hash() : acc(5381) { }
    template<typename Ch>
    void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
    operator Int() const { return acc; }
    Int acc;
};

int main(int argc, char* argv[])
{
    string s("Hellp, world");
    cout << hex << showbase
        << for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
    return 0;
}
#include <iostream>
#include <string>
#include <algorithm>

using namespace std;

// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
    hash() : acc(5381) { }
    template<typename Ch>
    void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
    operator Int() const { return acc; }
    Int acc;
};

int main(int argc, char* argv[])
{
    string s("Hellp, world");
    cout << hex << showbase
        << for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
    return 0;
}
他是夢罘是命 2024-09-02 12:02:53

小字符串的另一种方法:

int hash(const char* str) {
    int hash = 0;
    int c = 0;

    while (c < std::strlen(str)) {
        hash += (int)str[c] << (int)str[c+1];
        c++;
    }
    return hash;
}

Another way for small strings:

int hash(const char* str) {
    int hash = 0;
    int c = 0;

    while (c < std::strlen(str)) {
        hash += (int)str[c] << (int)str[c+1];
        c++;
    }
    return hash;
}
顾挽 2024-09-02 12:02:53

您可以使用成员函数 operator[]或字符串类或迭代器的 at 来访问字符串的各个字符对象而不将其转换为 C 风格的 char 数组。

要将字符串对象哈希为整数,您必须访问字符串对象的每个单独的字符,您可以执行以下操作:

for (i=0; i < str.length(); i++) {
    // use str[i] or str.at(i) to access ith element.
}

You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.

To hash a string object to an integer you'll have to access each individual char of the string object which you can do as:

for (i=0; i < str.length(); i++) {
    // use str[i] or str.at(i) to access ith element.
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文