这个哈希函数会异常频繁地发生冲突吗?
我有以下代码来生成对象的哈希值:
public int GetHashCode(MyType obj)
{
return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}
即,我添加所有属性的哈希代码,然后获取其哈希值。
在审查中,一位同事建议这会发生过于频繁的碰撞。我不确定这是真的,因为:
- 鉴于哈希码在正数和负数中以相同的频率选择并且它们环绕,我认为我们没有获得关于这些数字总和的可能性的任何其他信息与数字本身相反
- ,就其总和是非随机的而言,哈希码旨在使“靠近”的数字变得“相距很远”,因此不应将非均匀分布的值输入到函数中成为一个问题
谁是正确的?
它是用 C# 编写的,以防答案是特定于语言的。
I had the following code to generate a hash of an object:
public int GetHashCode(MyType obj)
{
return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}
I.e. I add all the properties' hash codes and then take the hash of this.
In review, a coworker suggested that this will collide too frequently. I'm not sure that this is true because:
- Given that hash codes are chosen with equal frequency among positive and negative numbers and they wrap around, I don't think there's any additional information we gain about the likelihood of these numbers' sum as opposed to the numbers themselves
- To the extent that their sum is non-random, hash codes are designed to make numbers that are "close together" become "far apart", so feeding a non-uniformly-distributed value into the function shouldn't be an issue
Who is correct?
It is in C#, in case the answer is language-specific.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的。
假设 Prop1、Prop2 等都是
int
类型。通常仅使用较低范围的整数。你的求和方法会发生不必要的冲突。7
的 HasCode 是 7,当它自己对int
进行散列时,这是非常有意义的。但是在您的代码中,元组<7, 3>
、<3, 7>
和<8, 2>
都将具有相同的哈希值。与简单的异或而不是加法相同。常见的方法是添加一些(质数)数字并进行移位:
数字 19、31、37 并不是太关键。如果您愿意,可以使用 OR 或 XOR 而不是
+
。Yes.
Just suppose Prop1, Prop2 etc are of type
int
. Usually only the lower range of integers is used. Your sum approach will collide more often than necessary.The HasCode of
7
is 7, which makes perfect sense when hashingint
by it self. But with your code the tuples<7, 3>
,<3, 7>
and<8, 2>
will all have the same Hash. The same with simple XOR instead of Addition.The common approach is to add some (prime) numbers and shifting:
The numbers 19, 31, 37 are not too critical. And if you prefer you can use OR or XOR instead of
+
.异或会更好:
XORing would be better:
您可以使用修改后的 FNV HashCode 生成器,一个非常相似的问题已被回答(由我)
此处
You can use a modified FNV HashCode generator, a very similar question has been answered (by me)
here