关于如何正确重写 object.GetHashCode() 的一般建议和指南

发布于 2024-08-03 09:44:18 字数 1339 浏览 7 评论 0原文

根据 MSDN,哈希函数必须具有以下内容特性:

  1. 如果两个对象比较相等,则每个对象的 GetHashCode 方法必须返回相同的值。但是,如果两个对象比较不相等,则两个对象的 GetHashCode 方法不必返回不同的值。

  2. 只要确定对象的 Equals 方法的返回值的对象状态没有发生修改,对象的 GetHashCode 方法就必须始终返回相同的哈希码。请注意,这仅适用于应用程序的当前执行,并且如果应用程序再次运行,则可以返回不同的哈希码。

  3. 为了获得最佳性能,哈希函数必须为所有输入生成随机分布。


我不断发现自己处于以下场景:我创建了一个类,实现了 IEquatable并重写了 object.Equals(object)MSDN 指出:

覆盖 Equals 的类型也必须覆盖 GetHashCode ;否则,Hashtable 可能无法正常工作。

然后它通常对我来说会有点停止。因为,如何正确重写object.GetHashCode()?永远不知道从哪里开始,而且似乎有很多陷阱。

在 StackOverflow 上,有很多与 GetHashCode 重写相关的问题,但其中大多数似乎都是针对非常特殊的情况和特定问题。因此,我想在这里得到一份好的汇编。包含一般建议和指南的概述。该做什么、不该做什么、常见的陷阱、从哪里开始等等。

我希望它特别针对 C#,但我认为它也适用于其他 .NET 语言(? )。


我认为最好的方法可能是为每个主题创建一个答案,首先给出一个快速而简短的答案(如果可能的话,接近一句话),然后可能是一些更多信息,最后以相关问题、讨论、博客文章结束等等,如果有的话。然后,我可以创建一篇文章作为接受的答案(将其放在顶部),只需一个“目录”。尽量保持简短。不要只链接到其他问题和博客文章。尝试获取它们的本质,然后链接到源(特别是因为源可能会消失。另外,请尝试编辑和改进答案,而不是创建大量非常相似的答案。

我不是非常好的技术作家,但我至少会尝试格式化答案,使它们看起来相似,创建目录等。我还将尝试在此处搜索一些相关问题,以回答其中的部分问题,也许会拉但由于我在这个话题上不太稳定,所以我会尽量避开大部分内容:p

According to MSDN, a hash function must have the following properties:

  1. If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  2. The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  3. For the best performance, a hash function must generate a random distribution for all input.


I keep finding myself in the following scenario: I have created a class, implemented IEquatable<T> and overridden object.Equals(object). MSDN states that:

Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.

And then it usually stops up a bit for me. Because, how do you properly override object.GetHashCode()? Never really know where to start, and it seems to be a lot of pitfalls.

Here at StackOverflow, there are quite a few questions related to GetHashCode overriding, but most of them seems to be on quite particular cases and specific issues. So, therefore I would like to get a good compilation here. An overview with general advice and guidelines. What to do, what not to do, common pitfalls, where to start, etc.

I would like it to be especially directed at C#, but I would think it will work kind of the same way for other .NET languages as well(?).


I think maybe the best way is to create one answer per topic with a quick and short answer first (close to one-liner if at all possible), then maybe some more information and end with related questions, discussions, blog posts, etc., if there are any. I can then create one post as the accepted answer (to get it on top) with just a "table of contents". Try to keep it short and concise. And don't just link to other questions and blog posts. Try to take the essence of them and then rather link to source (especially since the source could disappear. Also, please try to edit and improve answers instead of created lots of very similar ones.

I am not a very good technical writer, but I will at least try to format answers so they look alike, create the table of contents, etc. I will also try to search up some of the related questions here at SO that answers parts of these and maybe pull out the essence of the ones I can manage. But since I am not very stable on this topic, I will try to stay away for the most part :p

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

从此见与不见 2024-08-10 09:44:18

目录


我希望涵盖但尚未涵盖的内容然而:

  • 如何创建整数(如何将对象“转换”为 int 对我来说并不是很明显)。
  • 哈希码基于哪些字段。
    • 如果它应该只存在于不可变字段上,那么如果只有可变字段怎么办?
  • 如何生成良好的随机分布。 (MSDN 属性 #3)
    • 对此,似乎要选择一个很好的神奇素数(已经使用过 17、23 和 397),但是如何选择它,它到底有什么用?
  • 如何确保哈希码在整个对象生命周期中保持不变。 (MSDN 属性 #2)
    • 特别是当相等性基于可变字段时。 (MSDN 属性 #1)
  • 如何处理复杂类型的字段(不在 中)内置 C# 类型)。
    • 复杂对象和结构、数组、集合、列表、字典、泛型类型等。
    • 例如,即使列表或字典可能是只读的,但这并不意味着它的内容是只读的。
  • 如何处理继承的类。
    • 您是否应该以某种方式将 base.GetHashCode() 合并到您的哈希码中?
  • 从技术上来说,你能偷懒并返回 0 吗?会严重违反 MSDN 指南 #3,但至少会确保 #1 和 #2 始终正确:P
  • 常见陷阱和陷阱。

Table of contents


Things that I would like to be covered, but haven't been yet:

  • How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
  • What fields to base the hash code upon.
    • If it should only be on immutable fields, what if there are only mutable ones?
  • How to generate a good random distribution. (MSDN Property #3)
    • Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
  • How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
    • Especially when the equality is based upon mutable fields. (MSDN Property #1)
  • How to deal with fields that are complex types (not among the built-in C# types).
    • Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
    • For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
  • How to deal with inherited classes.
    • Should you somehow incorporate base.GetHashCode() into your hash code?
  • Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
  • Common pitfalls and gotchas.
花之痕靓丽 2024-08-10 09:44:18

GetHashCode 实现中经常看到的那些神奇数字是什么?

它们是质数。质数用于创建哈希码,因为质数可以最大化哈希码空间的使用。

具体来说,从小素数 3 开始,只考虑结果:

  • 3 * 1 = 3 = 3(mod 8) = 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010
  • 3 * 3 = 9 = 1( mod 8) = 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111代码>
  • 3 * 6 = 18 = 2(mod 8) = 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001
  • 3 * 8 = 24 = 0( mod 8) = 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011

我们重新开始。但是您会注意到,在开始重复之前,素数的连续倍数在我们的半字节中生成了所有可能的位排列。我们可以使用任何素数和任意位数获得相同的效果,这使得素数最适合生成近随机哈希码。我们通常看到较大素数而不是小素数(如上例中的 3)的原因是,对于散列码中的位数较多的情况,使用小素数获得的结果甚至不是伪随机的 - 它们只是一个增加序列直到遇到溢出。为了获得最佳随机性,应使用导致相当小的系数溢出的质数,除非您可以保证系数不会很小。

相关链接:

What are those magic numbers often seen in GetHashCode implementations?

They are prime numbers. Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space.

Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results:

  • 3 * 1 = 3 = 3(mod 8) = 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010
  • 3 * 3 = 9 = 1(mod 8) = 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111
  • 3 * 6 = 18 = 2(mod 8) = 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001
  • 3 * 8 = 24 = 0(mod 8) = 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011

And we start over. But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small.

Related links:

无语# 2024-08-10 09:44:18

为什么我必须重写 object.GetHashCode()

重写此方法很重要,因为以下属性必须始终保持 true:

如果两个对象比较相等,则每个对象的 GetHashCode 方法必须返回相同的值。

原因如 JaredPar关于实现平等的博客文章,是这样吗?

许多类使用哈希码来对对象进行分类。特别是哈希表和字典倾向于根据哈希码将对象放置在存储桶中。当检查一个对象是否已经在哈希表中时,它会首先在存储桶中查找它。如果两个对象相等但哈希码不同,它们可能会被放入不同的桶中,字典将无法查找该对象。

相关链接:

Why do I have to override object.GetHashCode()?

Overriding this method is important because the following property must always remain true:

If two objects compare as equal, the GetHashCode method for each object must return the same value.

The reason, as stated by JaredPar in a blog post on implementing equality, is that

Many classes use the hash code to classify an object. In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. When checking if an object is already in the hash table it will first look for it in a bucket. If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object.

Related links:

一袭白衣梦中忆 2024-08-10 09:44:18

查看 Eric Lippert 撰写的 GetHashCode 指南和规则

Check out Guidelines and rules for GetHashCode by Eric Lippert

笛声青案梦长安 2024-08-10 09:44:18

只要您对该类型的对象有有意义的相等性度量(即您覆盖 Equals),就应该覆盖它。如果您知道该对象不会因任何原因被散列,您可以保留它,但您不太可能提前知道这一点。

哈希值应仅基于用于定义相等性的对象的属性,因为被视为相等的两个对象应具有相同的哈希码。一般来说,您通常会这样做:


public override int GetHashCode()
{
    int mc = //magic constant, usually some prime
    return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();
}

我通常假设将值相乘会产生相当均匀的分布,假设每个属性的哈希码函数执行相同的操作,尽管这很可能是错误的。使用此方法,如果对象相等性定义属性发生变化,则哈希码也可能发生变化,鉴于问题中的定义#2,这是可以接受的。它还以统一的方式处理所有类型。

您可以为所有实例返回相同的值,尽管这将使任何使用散列的算法(例如字典)非常慢 - 本质上所有实例都将被散列到同一个存储桶,然后查找将变为 O(n) 而不是预期的 O(n) O(1)。这当然否定了使用此类结构进行查找的任何好处。

You should override it whenever you have a meaningful measure of equality for objects of that type (i.e. you override Equals). If you knew the object wasn't going to be hashed for any reason you could leave it, but it's unlikely you could know this in advance.

The hash should be based only on the properties of the object that are used to define equality since two objects that are considered equal should have the same hash code. In general you would usually do something like:


public override int GetHashCode()
{
    int mc = //magic constant, usually some prime
    return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();
}

I usually assume multiplying the values together will produce a fairly uniform distribution, assuming each property's hashcode function does the same, although this may well be wrong. Using this method, if the objects equality-defining properties change, then the hash code is also likely to change, which is acceptable given definition #2 in your question. It also deals with all types in a uniform way.

You could return the same value for all instances, although this will make any algorithms that use hashing (such as dictionarys) very slow - essentially all instances will be hashed to the same bucket and lookup will then become O(n) instead of the expected O(1). This of course negates any benefits of using such structures for lookup.

此刻的回忆 2024-08-10 09:44:18

A) 如果您想使用值相等而不是默认的引用相等,则必须重写 Equals 和 GetHashCode。对于后者,如果两个对象引用都引用同一个对象实例,则它们比较相等。对于前者,如果它们的值相同,即使它们引用不同的对象,它们也比较相等。例如,您可能希望对 Date、Money 和 Point 对象采用值相等。

B) 为了实现值相等,您必须重写 Equals 和 GetHashCode。两者都应该取决于封装该值的对象的字段。例如,日期.年、日期.月和日期.日;或 Money.Currency 和 Money.Amount;或 X 点、Y 点和 Z 点。您还应该考虑重写运算符 ==、运算符 !=、运算符 < 和运算符 >。

C) 哈希码不必在对象的整个生命周期中保持不变。然而,当它作为哈希的键参与时,它必须保持不可变。来自 MSDN doco for Dictionary:“只要一个对象用作 Dictionary<(Of <(TKey, TValue>)>) 中的键,它就不能以任何影响其哈希值的方式进行更改。”如果必须更改键的值,请从字典中删除该条目,更改键值,然后替换该条目。

D)在我看来,如果你的价值对象本身是不可变的,你的生活就会变得简单。

A) You must override both Equals and GetHashCode if you want to employ value equality instead of the default reference equality. With the later, two object references compare as equal if they both refer to the same object instance. With the former they compare as equal if their value is the same even if they refer to different objects. For example, you probably want to employ value equality for Date, Money, and Point objects.

B) In order to implement value equality you must override Equals and GetHashCode. Both should depend on the fields of the object that encapsulate the value. For example, Date.Year, Date.Month and Date.Day; or Money.Currency and Money.Amount; or Point.X, Point.Y and Point.Z. You should also consider overriding operator ==, operator !=, operator <, and operator >.

C) The hashcode doesn't have to stay constant all through the object lifetime. However it must remain immutable while it participates as the key in a hash. From MSDN doco for Dictionary: "As long as an object is used as a key in the Dictionary<(Of <(TKey, TValue>)>), it must not change in any way that affects its hash value." If you must change the value of a key remove the entry from the dictionary, change the key value, and replace the entry.

D) IMO, you will simplify your life if your value objects are themselves immutable.

や三分注定 2024-08-10 09:44:18

我什么时候重写object.GetHashCode()

正如MSDN所述:

覆盖 Equals 的类型也必须覆盖 GetHashCode ;否则,Hashtable 可能无法正常工作。

相关链接:

When do I override object.GetHashCode()?

As MSDN states:

Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.

Related links:

筑梦 2024-08-10 09:44:18

哈希码基于哪些字段?如果它只应该在不可变字段上,那么如果只有可变字段怎么办?

它不需要仅基于不可变字段。我将其基于确定 equals 方法结果的字段。

What fields to base the hash code upon? If it should only be on immutable fields, what if there are only mutable ones?

It doesn't need to be based only on immutable fields. I would base it on the fields that determine the outcome of the equals method.

心头的小情儿 2024-08-10 09:44:18

如何确保哈希码在整个对象生命周期中保持不变。 (MSDN 属性 #2)尤其是当相等性基于可变字段时。 (MSDN 属性#1)

您似乎误解了属性#2。哈希码不需要在对象的生命周期内保持不变。只要确定 equals 方法结果的值不改变,它就需要保持不变。因此从逻辑上讲,哈希码仅基于这些值。那么应该就没有问题了。

How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2) Especially when the equality is based upon mutable fields. (MSDN Property #1)

You seem to misunderstand Property #2. The hashcode doesn't need to stay the same thoughout the objects lifetime. It just needs to stay the same as long as the values that determine the outcome of the equals method are not changed. So logically, you base the hashcode on those values only. Then there shouldn't be a problem.

public override int GetHashCode()
{
    return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;
}

在 customClass 的 GetHasCode 方法中执行相同的操作。就像魅力一样。

public override int GetHashCode()
{
    return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;
}

Do the same in the customClass's GetHasCode method. Works like a charm.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文