构建哈希表/哈希函数

发布于 2024-09-03 23:42:06 字数 402 浏览 11 评论 0原文

我想构建一个哈希表，在 1 到 15 个字节的字节序列（字符串）中查找键。

我想存储一个整数值，所以我想一个用于散列的数组就足够了。我很难概念化如何构造一个哈希函数，以便给定的键将给出数组的索引。

任何帮助将不胜感激。

哈希中的最大条目数为： 4081*15 + 4081*14 + ... 4081 = 4081((15*(16))/2) = 489720。

例如：

int table[489720];

int lookup(unsigned char *key)
{
    int index = hash(key);
    return table[index];
}

哈希有哪些好的选择函数，或者我将如何构建一个函数？

谢谢。

原文

I would like to construct a hash table that looks up keys in sequences (strings) of bytes ranging from 1 to 15 bytes.

I would like to store an integer value, so I imagine an array for hashing would suffice. I'm having difficulty conceptualizing how to construct a hash function such that given the key would give an index into the array.

Any assistance would be much appreiated.

The maximum number of entries in the hash is: 4081*15 + 4081*14 + ... 4081 = 4081((15*(16))/2) = 489720.

So for example:

int table[489720];

int lookup(unsigned char *key)
{
    int index = hash(key);
    return table[index];
}

What are some good choices for a hash function, or how would I go about constructing one?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

送君千里 2024-09-10 23:42:06

为了散列 C 字符串，我一直使用这个函数（取结果 % 你的散列表的大小）：

int hashstring(const char* s) {
  int key = 0;
  while (*s) {
    key = key*37 + *s++;
  }
  return key;
}

我不记得我最初从哪里得到它，但多年来它并没有让我失望。

To hash C strings, I've always used this function (take the result % your hash table's size):

int hashstring(const char* s) {
  int key = 0;
  while (*s) {
    key = key*37 + *s++;
  }
  return key;
}

I don't remember where I got it from initially, but in many years it hasn't let me down.

回复收藏 0 原文

信仰 2024-09-10 23:42:06

你的密钥空间很大（大约2^(8*15)），所以如果你想要一个完美的哈希，你需要提前知道489720个实际的密钥会出现什么。即使如此，即使您允许使用更大的表（也称为非常低的负载因子），实际上也不可能为这些键找到完美的哈希值。我知道找到完美哈希的唯一方法是通过反复试验，除非您的表有接近 489720^2 个条目，否则随机哈希可能会失败。

我强烈建议使用常规（非完美）哈希和适当处理冲突，例如使用链接：

struct entry {
  unsigned char *key;
  int value;
  struct entry *next;
} *table[1<<20];
int lookup(unsigned char *key) {
  int index = hash(key) % (1<<20);
  for (struct entry *e = table[index]; e != NULL; e = e->next) {
    if (!strcmp(key, e->key)) return e->value;
  }
  // not found
}

我还建议您不要自己实现它 - 使用像这样的标准库C++ 哈希映射。

Your key space is large (approx 2^(8*15)), so if you want a perfect hash, you will need to know what 489720 actual keys will show up in advance. Even then, it is practically impossible to find a perfect hash for those keys, even if you allowed a much larger table (a.k.a. a very low load factor). The only way I know to find a perfect hash is by trial and error, and a random hash is likely to fail unless your table has close to 489720^2 entries.

I highly recommend using a regular (non-perfect) hash and deal with collisions appropriately, e.g. with chaining:

struct entry {
  unsigned char *key;
  int value;
  struct entry *next;
} *table[1<<20];
int lookup(unsigned char *key) {
  int index = hash(key) % (1<<20);
  for (struct entry *e = table[index]; e != NULL; e = e->next) {
    if (!strcmp(key, e->key)) return e->value;
  }
  // not found
}

I also recommend you don't implement this yourself - use a standard library like a c++ hashmap.

回复收藏 0 原文

臻嫒无言 2024-09-10 23:42:06

如果您想要一个完美的哈希，那么您可以首先阅读关于完美哈希的维基百科文章。如果您遇到困难，可以在这里寻求帮助。

回复收藏 0 原文

亂 2024-09-10 23:42:06

如果表中驻留的字符串平均数量较低（例如低于 10,000 个条目），则关联数组将是一种合理的方法，即使在现代 CPU 架构上使用线性搜索也是如此。

否则，构建“完美哈希”需要检查字符串的每个字符并根据可能的范围计算唯一值。例如，如果密钥中只允许使用 26 个字符 A..Z，则以下方法有效：

int
hash (const char *key)
{
   int h = 0;
   while (key && *key)
       h = h * 26 + (*key++ - 'A');
   return h;
}

If the the average number of strings resident in the table is low--like under 10,000 entries--an associative array would be a reasonable approach, even using a linear search if it's on a modern CPU architecture.

Otherwise, constructing a "perfect hash" requires inspecting each character of the string and computing a unique value based on the possible range. For example, if only the 26 characters A..Z are allowed in the key, this would work:

int
hash (const char *key)
{
   int h = 0;
   while (key && *key)
       h = h * 26 + (*key++ - 'A');
   return h;
}

回复收藏 0 原文

~没有更多了~

关于作者

素罗衫

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

构建哈希表/哈希函数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

构建哈希表/哈希函数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。