为什么在重写 Equals 方法时重写 GetHashCode 很重要?
给定以下类,
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
if (fooItem == null)
{
return false;
}
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Which is preferred?
return base.GetHashCode();
//return this.FooId.GetHashCode();
}
}
我重写了 Equals
方法,因为 Foo
代表 Foo
表的一行。 哪种方法是覆盖 GetHashCode
的首选方法?
为什么覆盖 GetHashCode
很重要?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
从 .NET 4.7 开始,重写 GetHashCode() 的首选方法如下所示。 如果面向较旧的 .NET 版本,请包含 System.ValueTuple nuget 包。
就性能而言,此方法将优于大多数复合哈希代码实现。 ValueTuple 是一个
struct
因此不会有任何垃圾,并且底层算法尽可能快。As of
.NET 4.7
the preferred method of overridingGetHashCode()
is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a
struct
so there won't be any garbage, and the underlying algorithm is as fast as it gets.怎么样:
How about:
重写
Equals()
时,请不要忘记检查 obj 参数是否为null
。还要比较类型。
原因是:
Equals
与null
比较时必须返回 false。 另请参阅 http://msdn.microsoft.com/en-us/library/bsc2ak47 .aspxPlease don´t forget to check the obj parameter against
null
when overridingEquals()
.And also compare the type.
The reason for this is:
Equals
must return false on comparison tonull
. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx只是添加上面的答案:
如果您不覆盖 Equals,则默认行为是比较对象的引用。 这同样适用于哈希码 - 默认实现通常基于引用的内存地址。
因为您确实覆盖了 Equals,这意味着正确的行为是比较您在 Equals 上实现的任何内容,而不是引用,因此您应该对哈希码执行相同的操作。
您的类的客户端将期望哈希码具有与 equals 方法类似的逻辑,例如使用 IEqualityComparer 的 linq 方法首先比较哈希码,只有当它们相等时,才会比较 Equals() 方法,这可能会更昂贵运行时,如果我们没有实现 hashcode,相等的对象可能会有不同的 hashcode(因为它们有不同的内存地址),并且会被错误地确定为不相等(Equals() 甚至不会命中)。
此外,除了以下问题:如果您在字典中使用它,您可能无法找到您的对象(因为它是由一个哈希码插入的,当您查找它时,默认哈希码可能会有所不同,并且再次使用 Equals()甚至不会被调用,就像 Marc Gravell 在他的回答中解释的那样,您还引入了对字典或哈希集概念的违反,该概念不应允许相同的键 -
当您覆盖 Equals 时,您已经声明这些对象本质上是相同的,因此您不希望它们都作为数据结构上假设具有唯一键的不同键。 但由于它们具有不同的哈希码,因此“相同”的密钥将作为不同的密钥插入。
Just to add on above answers:
If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference.
Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.
Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).
In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys -
you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.
因为框架要求两个相同的对象必须有相同的hashcode。 如果重写 equals 方法对两个对象进行特殊比较,并且该方法认为两个对象相同,那么两个对象的哈希码也必须相同。 (字典和哈希表依赖于这个原则)。
It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).
我们有两个问题需要解决。
如果以下字段中存在任何字段,则您无法提供合理的
GetHashCode()
对象可以改变。 通常,一个对象永远不会被用于
依赖于
GetHashCode()
的集合。 所以成本实现
GetHashCode()
通常不值得,或者根本不值得可能。
如果有人将您的对象放入调用的集合中
GetHashCode()
并且您已重写Equals()
而无需进行GetHashCode()
以正确的方式行事,该人可能会花费数天时间跟踪问题。
因此默认情况下我会这样做。
We have two problems to cope with.
You cannot provide a sensible
GetHashCode()
if any field in theobject can be changed. Also often a object will NEVER be used in a
collection that depends on
GetHashCode()
. So the cost ofimplementing
GetHashCode()
is often not worth it, or it is notpossible.
If someone puts your object in a collection that calls
GetHashCode()
and you have overridedEquals()
without also makingGetHashCode()
behave in a correct way, that person may spend daystracking down the problem.
Therefore by default I do.
哈希码用于基于哈希的集合,如 Dictionary、Hashtable、HashSet 等。此代码的目的是通过将特定对象放入特定组(桶)来非常快速地对其进行预排序。 当您需要从哈希集合检索该对象时,这种预排序非常有助于查找该对象,因为代码必须仅在一个存储桶中搜索您的对象,而不是在它包含的所有对象中搜索。 哈希码的分布越好(唯一性越好),检索速度越快。 在理想情况下,每个对象都有唯一的哈希码,找到它是一个 O(1) 操作。 在大多数情况下,它接近 O(1)。
Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).
这并不一定重要; 这取决于您的馆藏大小和性能要求,以及您的类是否将在您可能不知道性能要求的库中使用。 我经常知道我的集合大小不是很大,而且我的时间比通过创建完美哈希码获得的几微秒的性能更有价值; 所以(为了摆脱编译器发出的烦人的警告)我只需使用:(
当然我也可以使用#pragma来关闭警告,但我更喜欢这种方式。)
当你处于你的位置时确实需要比这里其他人提到的所有问题都适用的性能。 最重要 - 否则,当从哈希集或字典中检索项目时,您会得到错误的结果:哈希代码不得随对象的生命周期而变化(更准确地说,每当需要哈希码时,例如作为字典中的键时):例如,以下内容是错误的,因为 Value 是公共的,因此可以在实例的生命周期内从外部更改到类,因此您不得将其用作哈希码的基础:
另一方面,如果值无法更改,则可以使用:
It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:
(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)
When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:
On the other hand, if Value can't be changed it's ok to use:
自 C# 9(.net 5或 .net core 3.1),您可能需要使用 记录,因为它基于值默认情况下平等。
As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality by default.
您应该始终保证,如果两个对象相等(如 Equals() 所定义的那样),它们应该返回相同的哈希码。 正如其他一些评论所述,理论上,如果对象永远不会在基于哈希的容器(如 HashSet 或 Dictionary)中使用,则这不是强制性的。 不过,我建议您始终遵守这条规则。 原因很简单,因为对于某些人来说,出于实际提高性能或只是以更好的方式传达代码语义的良好意图,将集合从一种类型更改为另一种类型太容易了。
例如,假设我们在列表中保存一些对象。 一段时间后,有人实际上意识到 HashSet 是一个更好的选择,因为它具有更好的搜索特性。 这就是我们可能遇到麻烦的时候。 List 将在内部使用该类型的默认相等比较器,这意味着在您的情况下等于,而 HashSet 使用 GetHashCode()。 如果两者的行为不同,你的程序也会不同。 请记住,此类问题并不是最容易解决的。
我在博客文章中总结了这种行为以及其他一些 GetHashCode() 陷阱,您可以在其中找到更多信息示例和解释。
You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.
For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.
I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.
据我了解,原始的 GetHashCode() 返回对象的内存地址,因此如果您想比较两个不同的对象,则必须重写它。
编辑:
这是不正确的,原来的 GetHashCode() 方法不能保证 2 个值相等。 尽管相等的对象返回相同的哈希码。
It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.
EDITED:
That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.
在 .NET 中,当您重写
Equals()
方法时,建议也重写GetHashCode()
。 原因与.NET 在其内置数据结构中使用GetHashCode()
的方式有关。当您将对象存储在基于哈希的集合(例如
Dictionary
或HashSet
)中时,.NET 使用GetHashCode()
返回的值来组织其对象数据。 被视为相等的对象应返回相同的哈希码,从而在从此类集合中检索对象时提供最佳性能。如果您重写
Equals()
,您将更改使两个对象相等的定义。 因此,如果您不重写GetHashCode()
,您认为“相等”的对象可能会返回不同的哈希码。 当对象在基于哈希的集合中使用时,这可能会导致不一致的行为。 即使您知道它们在那里,也可能在集合中找不到它们,因为集合正在错误的哈希存储桶中查找。让我们看一个例子。 假设您有一个
Person
类,并且您已重写Equals()
来表示两个Person
对象相等(如果它们的Name< /code> 属性匹配。 但您忘记重写
GetHashCode()
。 现在,如果您将带有Name="John"
的Person
对象添加到HashSet
,然后尝试检查Person 是否如果
对象,它可能会返回HashSet
中存在Name="John"
的false
,这是不正确的,因为GetHashCode()
可能会返回对象引用的哈希码,而不是用于相等比较的Name
字符串。为了避免此问题,每当您重写
Equals()
时,您还应该重写GetHashCode()
以确保它使用与Equals()
相同的属性> 确实如此。 这将有助于在使用基于哈希的集合时保持一致性。重写
GetHashCode()
需要生成一个哈希码,该哈希码考虑Equals()
中使用的相同属性,并且均匀分布以防止哈希冲突。以下是如何实现此目的的一个示例:
在此示例中,
field1
和field2
是Equals()
方法检查的字段。 常量17
和23
只是任意选择的“神奇”数字,通常会产生良好的结果。您还可以在 C# 8.0 及更高版本中使用
HashCode.Combine()
:请记住,
GetHashCode()
的目标不是完全避免冲突,而是均匀分布冲突。 冲突是不可避免的,因为例如,可能的哈希码(2^32
forint
)的数量小于可能的字符串值的数量。 但良好的哈希函数将有助于确保哈希码值的分布更加均匀并减少冲突的概率,从而在使用基于哈希的集合时获得更好的性能。In .NET, when you override the
Equals()
method, it's recommended to also overrideGetHashCode()
. The reason is related to how .NET usesGetHashCode()
in its built-in data structures.When you store an object in a hash-based collection like
Dictionary
orHashSet
, .NET uses the value returned byGetHashCode()
to organize its data. Objects that are considered equal should return the same hash code, providing optimal performance when retrieving objects from such a collection.If you override
Equals()
, you're changing the definition of what makes two objects equal. So, if you don't also overrideGetHashCode()
, objects that you consider "equal" may return different hash codes. This can lead to inconsistent behavior when objects are used in a hash-based collection. They might not be found in the collection, even though you know they're there, because the collection is looking in the wrong hash bucket.Let's see an example. Suppose, you have a
Person
class and you have overriddenEquals()
to say that twoPerson
objects are equal if theirName
property matches. But you forgot to overrideGetHashCode()
. Now, if you add aPerson
object withName="John"
to aHashSet
, and later try to check if thePerson
object withName="John"
exists in theHashSet
, it might returnfalse
, which is incorrect, because theGetHashCode()
might be returning the hash code of the object reference, not theName
string which you're using for equality comparison.To avoid this issue, anytime you override
Equals()
, you should also overrideGetHashCode()
to ensure it uses the same properties thatEquals()
does. This will help maintain consistency when using hash-based collections.Overriding
GetHashCode()
requires producing a hash code that considers the same properties used inEquals()
, and is also evenly distributed to prevent hash collisions.Here is one example of how you might achieve this:
In this example,
field1
andfield2
are the fields that theEquals()
method checks. The constants17
and23
are just arbitrarily chosen 'magic' numbers that often give good results.You can also use
HashCode.Combine()
in C# 8.0 and later:Remember, the goal of
GetHashCode()
is not to avoid collisions entirely, but to distribute them evenly. Collisions are inevitable because the number of possible hash codes (2^32
forint
) is smaller than the number of possible string values, for example. But a good hash function will help ensure a more even distribution of hash code values and reduce the probability of collision, resulting in better performance when using hash-based collections.考虑到公共属性,下面使用反射在我看来是一个更好的选择,因为这样你就不必担心添加/删除属性(尽管不是很常见的情况)。 我发现这也表现得更好。(使用 Diagonistics 秒表比较时间)。
Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).
是的,如果您的项目将用作字典或
HashSet
等中的键,这一点很重要 - 因为它被使用(在没有自定义IEqualityComparer 的情况下) ;
) 将项目分组到存储桶中。 如果两个项目的哈希码不匹配,则它们可能永远被视为相等(Equals 永远不会被调用)。GetHashCode() 方法应反映
Equals
逻辑; 规则是:Equals(...) == true
),那么它们必须为GetHashCode() 返回相同的值code>
GetHashCode()
相等,则它们不必相同; 这是一个冲突,Equals
将被调用来看看它是否是真正的相等。在本例中,“
return FooId;
”看起来是一个合适的GetHashCode()
实现。 如果您正在测试多个属性,通常使用如下代码将它们组合起来,以减少对角线冲突(即,以便new Foo(3,5)
具有与不同的哈希码new Foo(5,3)
):在现代框架中,
HashCode
类型具有帮助您从多个值创建哈希码的方法; 在较旧的框架上,您需要不需要,所以类似:哦 - 为了方便起见,您还可以考虑在覆盖
==
和!=
运算符>等于 和GetHashCode
。此处演示了当您犯此错误时会发生什么。
Yes, it is important if your item will be used as a key in a dictionary, or
HashSet<T>
, etc - since this is used (in the absence of a customIEqualityComparer<T>
) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).The GetHashCode() method should reflect the
Equals
logic; the rules are:Equals(...) == true
) then they must return the same value forGetHashCode()
GetHashCode()
is equal, it is not necessary for them to be the same; this is a collision, andEquals
will be called to see if it is a real equality or not.In this case, it looks like "
return FooId;
" is a suitableGetHashCode()
implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so thatnew Foo(3,5)
has a different hash-code tonew Foo(5,3)
):In modern frameworks, the
HashCode
type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:Oh - for convenience, you might also consider providing
==
and!=
operators when overridingEquals
andGetHashCode
.A demonstration of what happens when you get this wrong is here.
实际上,正确实现
GetHashCode()
非常困难,因为除了 Marc 已经提到的规则之外,哈希码在对象的生命周期内不应更改。 因此,用于计算哈希码的字段必须是不可变的。我在使用NHibernate的时候终于找到了这个问题的解决方案。
我的方法是根据对象的 ID 计算哈希码。 ID 只能通过构造函数设置,因此如果您想更改 ID(这种情况不太可能发生),您必须创建一个具有新 ID 和新哈希码的新对象。 此方法最适合 GUID,因为您可以提供随机生成 ID 的无参数构造函数。
It's actually very hard to implement
GetHashCode()
correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.I finally found a solution to this problem when I was working with NHibernate.
My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
通过重写
Equals
,您基本上是在声明您更好地了解如何比较给定类型的两个实例。下面您可以看到 ReSharper 如何为您编写
GetHashCode()
函数的示例。 请注意,此代码片段旨在由程序员进行调整:如您所见,它只是尝试根据类中的所有字段猜测一个好的哈希代码,但是如果您知道对象的域或值范围,您仍然可以提供一个更好的一个。
By overriding
Equals
you're basically stating that you know better how to compare two instances of a given type.Below you can see an example of how ReSharper writes a
GetHashCode()
function for you. Note that this snippet is meant to be tweaked by the programmer:As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.