这个哈希函数会异常频繁地发生冲突吗?

发布于 2024-11-14 13:32:34 字数 468 浏览 8 评论 0原文

我有以下代码来生成对象的哈希值:

public int GetHashCode(MyType obj)
{
   return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}

即,我添加所有属性的哈希代码,然后获取其哈希值。

在审查中,一位同事建议这会发生过于频繁的碰撞。我不确定这是真的,因为:

  1. 鉴于哈希码在正数和负数中以相同的频率选择并且它们环绕,我认为我们没有获得关于这些数字总和的可能性的任何其他信息与数字本身相反
  2. ,就其总和是非随机的而言,哈希码旨在使“靠近”的数字变得“相距很远”,因此不应将非均匀分布的值输入到函数中成为一个问题

谁是正确的?

它是用 C# 编写的,以防答案是特定于语言的。

I had the following code to generate a hash of an object:

public int GetHashCode(MyType obj)
{
   return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}

I.e. I add all the properties' hash codes and then take the hash of this.

In review, a coworker suggested that this will collide too frequently. I'm not sure that this is true because:

  1. Given that hash codes are chosen with equal frequency among positive and negative numbers and they wrap around, I don't think there's any additional information we gain about the likelihood of these numbers' sum as opposed to the numbers themselves
  2. To the extent that their sum is non-random, hash codes are designed to make numbers that are "close together" become "far apart", so feeding a non-uniformly-distributed value into the function shouldn't be an issue

Who is correct?

It is in C#, in case the answer is language-specific.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冰魂雪魄 2024-11-21 13:32:34

是的。

假设 Prop1、Prop2 等都是 int 类型。通常仅使用较低范围的整数。你的求和方法会发生不必要的冲突。

7 的 HasCode 是 7,当它自己对 int 进行散列时,这是非常有意义的。但是在您的代码中,元组 <7, 3><3, 7><8, 2> 都将具有相同的哈希值。与简单的异或而不是加法相同。

常见的方法是添加一些(质数)数字并进行移位:

public int GetHashCode(MyType obj)
{
  int hash = 0;
  unchecked
  {         
     hash += 19 * obj.Prop1.GetHashCode();
     hash += 31 * obj.Prop2.GetHashCode();
     hash += 37 * obj.Prop3.GetHashCode();
  }
  return hash;
}

数字 19、31、37 并不是太关键。如果您愿意,可以使用 OR 或 XOR 而不是 +

Yes.

Just suppose Prop1, Prop2 etc are of type int. Usually only the lower range of integers is used. Your sum approach will collide more often than necessary.

The HasCode of 7 is 7, which makes perfect sense when hashing int by it self. But with your code the tuples <7, 3>, <3, 7> and <8, 2> will all have the same Hash. The same with simple XOR instead of Addition.

The common approach is to add some (prime) numbers and shifting:

public int GetHashCode(MyType obj)
{
  int hash = 0;
  unchecked
  {         
     hash += 19 * obj.Prop1.GetHashCode();
     hash += 31 * obj.Prop2.GetHashCode();
     hash += 37 * obj.Prop3.GetHashCode();
  }
  return hash;
}

The numbers 19, 31, 37 are not too critical. And if you prefer you can use OR or XOR instead of + .

橘和柠 2024-11-21 13:32:34

异或会更好:

public int GetHashCode(MyType obj)
{
   return obj.Prop1.GetHashCode() ^ 
          obj.Prop2.GetHashCode() ^ 
          obj.Prop3.GetHashCode();
}

XORing would be better:

public int GetHashCode(MyType obj)
{
   return obj.Prop1.GetHashCode() ^ 
          obj.Prop2.GetHashCode() ^ 
          obj.Prop3.GetHashCode();
}
娇柔作态 2024-11-21 13:32:34

您可以使用修改后的 FNV HashCode 生成器,一个非常相似的问题已被回答(由我)
此处

You can use a modified FNV HashCode generator, a very similar question has been answered (by me)
here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文