对哈希表中使用的字符串进行哈希处理（双重哈希）

发布于 2024-12-14 09:16:31 字数 891 浏览 2 评论 0原文

我正在尝试使用双重哈希将字符串键哈希到哈希表中。我做了类似的事情：

protected int getIndex(String key) {
  int itr = 0,
      size = this.values.length,
      index1,
      index2,
      index = 0;

  do {
    // do double hashing to get index for curr [itr] (iteration)
    index1 = Math.abs(key.hashCode()) % size;
    index2 = size - ((key + key + "#!@").hashCode() % size); # trying very hard to eliminate clash, but still fails ... TA and AT gets index 2 when size = 5
    index = (index1 + (itr * index2)) % size;

    // if itr > set threshold, exit
    itr++;
    if (itr > 200) {
      index = -1;
      break;
    }

    // once index found, exit loop
  } while (index > 0 && this.keys[index] != null && !this.keys[index].equals(key));

  return index;
}

主要部分是 do 之后的第 1 3 行。我可以说如果我使用双重哈希，它应该消除碰撞的可能性吗？ size 是我的哈希表的唯一键的总可能值

原文

I am trying to use Double Hashing to hash a String key into a hash table. I did something like:

protected int getIndex(String key) {
  int itr = 0,
      size = this.values.length,
      index1,
      index2,
      index = 0;

  do {
    // do double hashing to get index for curr [itr] (iteration)
    index1 = Math.abs(key.hashCode()) % size;
    index2 = size - ((key + key + "#!@").hashCode() % size); # trying very hard to eliminate clash, but still fails ... TA and AT gets index 2 when size = 5
    index = (index1 + (itr * index2)) % size;

    // if itr > set threshold, exit
    itr++;
    if (itr > 200) {
      index = -1;
      break;
    }

    // once index found, exit loop
  } while (index > 0 && this.keys[index] != null && !this.keys[index].equals(key));

  return index;
}

Main part is the 1st 3 lines after the do. Can I say if I use Double Hashing, it should eliminate the probability of collision? size is total possible values of unique keys for my hash table

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柠檬 2024-12-21 09:16:31

所以我看到这里发生了两件事

使用两个不同的哈希值并将它们组合起来以尝试获得更分布式的哈希值
如果哈希值失败，请尝试更远的新位置

乍一看，这两种方法似乎都是好方法以减少哈希冲突。然而，经过仔细检查，这两者都陷入了真正的算法问题。

组合两个哈希
哈希算法被设计为在整数范围内分布得相当好。就像将两个随机数加在一起不会给你带来更多的随机性一样，将两个哈希值加在一起也不会给你带来更多的分布式结果。事实上，将两个相同的分布加在一起总是会得到分布不太均匀的东西。因此，使用相同底层算法的任何类型的双哈希策略都比单哈希策略更糟糕。

尝试新地点
如果第一个哈希值发生冲突，尝试使用新哈希值的算法是很诱人的。然而，这会导致算法的检索部分出现问题。当您将某些内容放入哈希中时，它会碰撞到另一个位置。然后当你去检索该值时，它不在那里。更糟糕的是，您是否找到它取决于第一个元素是否仍然存在。如果它已被删除，则无法判断您正在寻找的项目是否在更远的地方，或者它是否不在那里。最终，.contains 测试必须经过所有 200 次迭代才能确定它正在查找的哈希值不存在。

最好的解决方案是使用 Java 提供的开箱即用的哈希值。如果发生大量冲突，最好在哈希中使用较低的负载因子。这会增加桶的数量，并减少发生冲突的可能性。

回复收藏 0 原文

~没有更多了~