对哈希表中使用的字符串进行哈希处理(双重哈希)

发布于 2024-12-14 09:16:31 字数 891 浏览 2 评论 0原文

我正在尝试使用双重哈希将字符串键哈希到哈希表中。我做了类似的事情:

protected int getIndex(String key) {
  int itr = 0,
      size = this.values.length,
      index1,
      index2,
      index = 0;

  do {
    // do double hashing to get index for curr [itr] (iteration)
    index1 = Math.abs(key.hashCode()) % size;
    index2 = size - ((key + key + "#!@").hashCode() % size); # trying very hard to eliminate clash, but still fails ... TA and AT gets index 2 when size = 5
    index = (index1 + (itr * index2)) % size;

    // if itr > set threshold, exit
    itr++;
    if (itr > 200) {
      index = -1;
      break;
    }

    // once index found, exit loop
  } while (index > 0 && this.keys[index] != null && !this.keys[index].equals(key));

  return index;
}

主要部分是 do 之后的第 1 3 行。我可以说如果我使用双重哈希,它应该消除碰撞的可能性吗? size 是我的哈希表的唯一键的总可能值

I am trying to use Double Hashing to hash a String key into a hash table. I did something like:

protected int getIndex(String key) {
  int itr = 0,
      size = this.values.length,
      index1,
      index2,
      index = 0;

  do {
    // do double hashing to get index for curr [itr] (iteration)
    index1 = Math.abs(key.hashCode()) % size;
    index2 = size - ((key + key + "#!@").hashCode() % size); # trying very hard to eliminate clash, but still fails ... TA and AT gets index 2 when size = 5
    index = (index1 + (itr * index2)) % size;

    // if itr > set threshold, exit
    itr++;
    if (itr > 200) {
      index = -1;
      break;
    }

    // once index found, exit loop
  } while (index > 0 && this.keys[index] != null && !this.keys[index].equals(key));

  return index;
}

Main part is the 1st 3 lines after the do. Can I say if I use Double Hashing, it should eliminate the probability of collision? size is total possible values of unique keys for my hash table

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柠檬 2024-12-21 09:16:31

所以我看到这里发生了两件事

  1. 使用两个不同的哈希值并将它们组合起来以尝试获得更分布式的哈希值
  2. 如果哈希值失败,请尝试更远的新位置

乍一看,这两种方法似乎都是好方法以减少哈希冲突。然而,经过仔细检查,这两者都陷入了真正的算法问题。

组合两个哈希
哈希算法被设计为在整数范围内分布得相当好。就像将两个随机数加在一起不会给你带来更多的随机性一样,将两个哈希值加在一起也不会给你带来更多的分布式结果。事实上,将两个相同的分布加在一起总是会得到分布不太均匀的东西。因此,使用相同底层算法的任何类型的双哈希策略都比单哈希策略更糟糕。

尝试新地点
如果第一个哈希值发生冲突,尝试使用新哈希值的算法是很诱人的。然而,这会导致算法的检索部分出现问题。当您将某些内容放入哈希中时,它会碰撞到另一个位置。然后当你去检索该值时,它不在那里。更糟糕的是,您是否找到它取决于第一个元素是否仍然存在。如果它已被删除,则无法判断您正在寻找的项目是否在更远的地方,或者它是否不在那里。最终,.contains 测试必须经过所有 200 次迭代才能确定它正在查找的哈希值不存在。

最好的解决方案是使用 Java 提供的开箱即用的哈希值。如果发生大量冲突,最好在哈希中使用较低的负载因子。这会增加桶的数量,并减少发生冲突的可能性。

So I see two things going on here

  1. Using two different hashes and combining them in attempt to get a more distributed hash
  2. If a hash fails, trying a new spot a little farther along

At first blush, it looks like both of these are a good way to reduce hash collisions. However, on closer inspection, both of these fall into real algorithmic trouble.

Combining two hashes
Hashing algorithms are designed to be fairly well distributed across the integer spectrum. Just like how adding two random numbers together doesn't give you anything more randomer, adding two hashes together doesn't get you something more distributed. In fact, adding two idential distributions together will ALWAYS give you something less evenly distributed. Thus any kind of double hashing strategy, using the same underlying algorithm, is worse than a single hashing strategy.

Trying a new spot
It's tempting to try an algorithm that tries a new hash if the first one collides. However, this causes problems with the retrieval part of the algorithm. When you put something in the hash, and it bumps to the another spot. Then when you go to retrieve the value, it's not there. Worse yet, whether or not you even find it depends on if the first element is still there or not. If it's been removed, then it's impossible to tell if item you're looking for is further along, or if it just isn't there. Ultimately a .contains test would have to go through all 200 iterations before it could be sure that the hash it's looking for isn't there.

The best solution is to use the out of the box hash provided by Java. If you're getting a lot of collisions, the best thing to use a lower load factor in the hash. This increases the number of buckets, and causes collisions to be less likely.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文