C# 中的 GetHashCode 指南

发布于 2024-07-12 07:59:39 字数 245 浏览 7 评论 0原文

我在《Essential C# 3.0 and .NET 3.5》一书中读到:

GetHashCode() 在特定对象生命周期内的返回值应该是 常量(相同的值),即使对象的数据发生变化。 在许多 在这种情况下,您应该缓存方法返回来强制执行此操作。

这是有效的指导方针吗?

我在 .NET 中尝试了一些内置类型,但它们的行为并非如此。

I read in the Essential C# 3.0 and .NET 3.5 book that:

GetHashCode()’s returns over the life of a particular object should be
constant (the same value), even if the object’s data changes. In many
cases, you should cache the method return to enforce this.

Is this a valid guideline?

I have tried a couple built-in types in .NET and they didn't behave like this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

情绪失控 2024-07-19 07:59:39

虽然已经过去很长时间了,但我认为还是有必要对这个问题给出一个正确的答案,包括解释为什么和如何做。 到目前为止,最好的答案是详尽地引用 MSDN - 不要试图制定自己的规则,MS 的人知道他们在做什么。

但首先要做的事情是:
问题中引用的指南是错误的。

现在来说说原因 - 其中有两个

第一个原因
如果哈希码以某种方式计算,即使对象本身发生变化,它在对象的生命周期内也不会改变,那么它就会破坏 equals 契约。

记住:
“如果两个对象比较相等,则每个对象的 GetHashCode 方法必须返回相同的值。但是,如果两个对象比较不相等,则两个对象的 GetHashCode 方法不必返回不同的值。”

第二句话经常被误解为“唯一的规则是,在对象创建时,相等对象的哈希码必须相等”。 真的不知道为什么,但这也是这里大多数答案的本质。

想象两个包含名称的对象,其中该名称在 equals 方法中使用:相同名称 -> 一样。
创建实例 A:名称 = Joe
创建实例 B:名称 = Peter

哈希码 A 和哈希码 B 很可能不相同。
当实例 B 的名称更改为 Joe 时,现在会发生什么?

根据问题的指导,B的哈希码不会改变。 其结果将是:
A.等于(B) ==> 真的
但同时:
A.GetHashCode() == B.GetHashCode() ==> 错误的。

但 equals&hashcode 合约明确禁止这种行为。

第二个原因
当然,哈希码的更改可能会破坏哈希列表和使用哈希码的其他对象,但反之亦然。 在最坏的情况下,不更改哈希码将得到哈希列表,其中所有许多不同的对象都将具有相同的哈希码,因此位于相同的哈希箱中 - 例如,当对象使用标准值初始化时就会发生这种情况。


现在来谈谈如何做
乍一看,这似乎是一个矛盾——无论哪种方式,代码都会被破坏。
但这两个问题都不是由更改或未更改的哈希码引起的。

MSDN 中很好地描述了问题的根源:

来自 MSDN 的哈希表条目:

关键对象必须是不可变的,只要
因为它们被用作
哈希表。

这确实意味着:

创建哈希值的任何对象都应该在对象更改时更改哈希值,但当它在哈希表(或任何其他使用哈希的对象)内部使用时,它不得(绝对不得)允许对其自身进行任何更改, 当然)。

首先如何
最简单的方法当然是设计仅用于哈希表的不可变对象,这些对象将在需要时创建为普通可变对象的副本。
在不可变对象内部,缓存哈希码显然是可以的,因为它是不可变的。

第二个如何
或者给对象一个“你现在被散列”标志,确保所有对象数据都是私有的,检查所有可以更改对象数据的函数中的标志,如果不允许更改(即设置标志),则抛出异常数据。
现在,当您将对象放入任何散列区域时,请确保设置该标志,并且在不再需要时取消设置该标志。
为了便于使用,我建议在“GetHashCode”方法中自动设置该标志 - 这样就不会被忘记。 并且显式调用“ResetHashFlag”方法将确保程序员必须考虑现在是否允许更改对象数据。

好吧,还应该说的是:在某些情况下,对象可能具有可变数据,但当对象数据更改时,哈希码仍然保持不变,而不会违反 equals&hashcode 契约。

然而,这确实要求 equals 方法也不是基于可变数据。
因此,如果我编写一个对象,并创建一个 GetHashCode 方法,该方法仅计算一次值并将其存储在对象内以便在以后调用时返回它,那么我必须再次:绝对必须创建一个 Equals 方法,该方法将使用存储用于比较的值,因此 A.Equals(B) 也永远不会从 false 更改为 true。 否则,契约就会被破坏。 这样做的结果通常是 Equals 方法没有任何意义 - 它不是原始引用 equals,但也不是值 equals。 有时,这可能是预期行为(即客户记录),但通常并非如此。

因此,当对象数据更改时,只需使 GetHashCode 结果更改,并且如果打算(或可能)使用列表或对象在哈希内部使用对象,则使该对象不可变或创建一个只读标志以用于包含该对象的哈希列表的生命周期。

(顺便说一句:所有这些都不是 C# 或 .NET 特定的 - 它是所有哈希表实现的本质,或者更一般地是任何索引列表的本质,即当对象位于列表中时,对象的标识数据永远不应该改变如果这个规则被打破,就会发生意想不到的和不可预测的行为。在某个地方,可能会有列表实现,它们会监视列表中的所有元素并自动重新索引列表 - 但这些的性能充其量肯定是可怕的。)

It's been a long time, but nevertheless I think it is still necessary to give a correct answer to this question, including explanations about the whys and hows. The best answer so far is the one citing the MSDN exhaustivly - don't try to make your own rules, the MS guys knew what they were doing.

But first things first:
The Guideline as cited in the question is wrong.

Now the whys - there are two of them

First why:
If the hashcode is computed in a way, that it does not change during the lifetime of an object, even if the object itself changes, than it would break the equals-contract.

Remember:
"If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values."

The second sentence often is misinterpreted as "The only rule is, that at object creation time, the hashcode of equal objects must be equal". Don't really know why, but that's about the essence of most answers here as well.

Think of two objects containing a name, where the name is used in the equals method: Same name -> same thing.
Create Instance A: Name = Joe
Create Instance B: Name = Peter

Hashcode A and Hashcode B will most likely not be the same.
What would now happen, when the Name of instance B is changed to Joe?

According to the guideline from the question, the hashcode of B would not change. The result of this would be:
A.Equals(B) ==> true
But at the same time:
A.GetHashCode() == B.GetHashCode() ==> false.

But exactly this behaviour is forbidden explicitly by the equals&hashcode-contract.

Second why:
While it is - of course - true, that changes in the hashcode could break hashed lists and other objects using the hashcode, the reverse is true as well. Not changing the hashcode will in the worst case get hashed lists, where all of a lot of different objects will have the same hashcode and therefor be in the same hash bin - happens when objects are initialized with a standard value, for example.


Now coming to the hows
Well, on first glance, there seems to be a contradiction - either way, code will break.
But neither problem does come from changed or unchanged hashcode.

The source of the problems is well described in the MSDN:

From MSDN's hashtable entry:

Key objects must be immutable as long
as they are used as keys in the
Hashtable.

This does mean:

Any object that creates a hashvalue should change the hashvalue, when the object changes, but it must not - absolutely must not - allow any changes to itself, when it is used inside a Hashtable (or any other Hash-using object, of course).

First how
Easiest way would of course be to design immutable objects only for the use in hashtables, that will be created as copys of the normal, the mutable objects when needed.
Inside the immutable objects, it's obviusly ok to cache the hashcode, since it's immutable.

Second how
Or give the object a "you are hashed now"-flag, make sure all object data is private, check the flag in all functions that can change objects data and throw an exception data if change is not allowed (i.e. flag is set).
Now, when you put the object in any hashed area, make sure to set the flag, and - as well - unset the flag, when it is no longer needed.
For ease of use, I'd advise to set the flag automatically inside the "GetHashCode" method - this way it can't be forgotten. And the explicit call of a "ResetHashFlag" method will make sure, that the programmer will have to think, wether it is or is not allowed to change the objects data by now.

Ok, what should be said as well: There are cases, where it is possible to have objects with mutable data, where the hashcode is nevertheless unchanged, when the objects data is changed, without violating the equals&hashcode-contract.

This does however require, that the equals-method is not based on the mutable data as well.
So, if I write an object, and create a GetHashCode method that does calculate a value only once and stores it inside the object to return it on later calls, then I must, again: absolutely must, create a Equals method, that will use stored values for the comparison, so that A.Equals(B) will never change from false to true as well. Otherwise, the contract would be broken. The result of this will usually be that the Equals method doesn't make any sense - it's not the original reference equals, but it is neither a value equals as well. Sometimes, this may be intended behaviour (i.e. customer records), but usually it is not.

So, just make GetHashCode result change, when the object data changes, and if the use of the object inside of hash using lists or objects is intended (or just possible) then make the object either immutable or create a readonly flag to use for the lifetime of a hashed list containing the object.

(By the way: All of this is not C# oder .NET specific - it is in the nature of all hashtable implementations, or more generally of any indexed list, that identifying data of objects should never change, while the object is in the list. Unexpected and unpredictable behaviour will occur, if this rule is broken. Somewhere, there may be list implementations, that do monitor all elements inside the list and do automatic reindexing the list - but the performance of those will surely be gruesome at best.)

追我者格杀勿论 2024-07-19 07:59:39

答案主要是,这是一个有效的指导方针,但也许不是一个有效的规则。 它也没有讲述整个故事。

重点是,对于可变类型,不能将哈希码基于可变数据,因为两个相等的对象必须返回相同的哈希码,并且哈希码必须在对象的生命周期内有效。 如果哈希码发生变化,您最终会得到一个在哈希集合中丢失的对象,因为它不再存在于正确的哈希箱中。

例如,对象 A 返回哈希值为 1。因此,它位于哈希表的 bin 1 中。 然后,您更改对象 A,使其返回哈希值 2。当哈希表查找它时,它会在 bin 2 中查找,但找不到它 - 该对象在 bin 1 中是孤立的。这就是为什么哈希码必须在对象的生命周期内不改变 ,这也是编写 GetHashCode 实现令人头疼的原因之一。

更新
Eric Lippert 发布了一篇博客,其中提供了精彩的信息在GetHashCode上。

额外更新
我在上面做了一些更改:

  1. 我区分了指南和规则。
  2. 我划掉了“在对象的生命周期内”。

指南只是一个指南,而不是规则。 实际上,只有当事物期望对象遵循这些准则时,例如当它存储在哈希表中时,GetHashCode 才必须遵循这些准则。 如果您从不打算在哈希表(或依赖于 GetHashCode 规则的任何其他内容)中使用对象,则您的实现不需要遵循这些准则。

当您看到“在对象的生命周期内”时,您应该阅读“在对象需要与哈希表合作期间”或类似内容。 与大多数事情一样,GetHashCode 的作用是了解何时打破规则。

The answer is mostly, it is a valid guideline, but perhaps not a valid rule. It also doesn't tell the whole story.

The point being made is that for mutable types, you cannot base the hash code on the mutable data because two equal objects must return the same hash code and the hash code has to be valid for the lifetime of the object. If the hash code changes, you end up with an object that gets lost in a hashed collection because it no longer lives in the correct hash bin.

For example, object A returns hash of 1. So, it goes in bin 1 of the hash table. Then you change object A such that it returns a hash of 2. When a hash table goes looking for it, it looks in bin 2 and can't find it - the object is orphaned in bin 1. This is why the hash code must not change for the lifetime of the object, and just one reason why writing GetHashCode implementations is a pain in the butt.

Update
Eric Lippert has posted a blog that gives excellent information on GetHashCode.

Additional Update
I've made a couple of changes above:

  1. I made a distinction between guideline and rule.
  2. I struck through "for the lifetime of the object".

A guideline is just a guide, not a rule. In reality, GetHashCode only has to follow these guidelines when things expect the object to follow the guidelines, such as when it is being stored in a hash table. If you never intend to use your objects in hash tables (or anything else that relies on the rules of GetHashCode), your implementation doesn't need to follow the guidelines.

When you see "for the lifetime of the object", you should read "for the time the object needs to co-operate with hash tables" or similar. Like most things, GetHashCode is about knowing when to break the rules.

九八野马 2024-07-19 07:59:39

来自 MSDN

如果两个对象比较相等,则
每个对象的 GetHashCode 方法
必须返回相同的值。 然而,
如果两个对象不比较
等于,GetHashCode 方法
两个对象不必返回
不同的价值观。

对象的 GetHashCode 方法
必须始终返回相同的哈希值
代码只要没有
对对象状态的修改
决定了的返回值
对象的 Equals 方法。 请注意,这
仅对于当前执行为 true
一个应用程序,并且
可以返回不同的哈希码,如果
应用程序再次运行。

为了获得最佳性能,哈希
函数必须生成一个随机数
所有输入的分布。

这意味着如果对象的值发生变化,哈希码也应该发生变化。 例如,“Name”属性设置为“Tom”的“Person”类应该有一个哈希代码,如果将名称更改为“Jerry”,则应该有一个不同的代码。 否则,Tom == Jerry,这可能不是您想要的。


编辑

同样来自 MSDN:

重写 GetHashCode 的派生类也必须重写 Equals,以保证两个被视为相等的对象具有相同的哈希码; 否则,Hashtable 类型可能无法正常工作。

来自 MSDN 的哈希表条目

只要用作哈希表中的键,键对象就必须是不可变的。

我的理解是,可变对象应该随着其值的变化而返回不同的哈希码,除非它们是设计用于哈希表的。

在 System.Drawing.Point 的示例中,对象是可变的,并且当 X 或 Y 值更改时确实返回不同的哈希码。 这将使其不适合在哈希表中按原样使用。

From MSDN

If two objects compare as equal, the
GetHashCode method for each object
must return the same value. However,
if two objects do not compare as
equal, the GetHashCode methods for the
two object do not have to return
different values.

The GetHashCode method for an object
must consistently return the same hash
code as long as there is no
modification to the object state that
determines the return value of the
object's Equals method. Note that this
is true only for the current execution
of an application, and that a
different hash code can be returned if
the application is run again.

For the best performance, a hash
function must generate a random
distribution for all input.

This means that if the value(s) of the object change, the hash code should change. For example, a "Person" class with the "Name" property set to "Tom" should have one hash code, and a different code if you change the name to "Jerry". Otherwise, Tom == Jerry, which is probably not what you would have intended.


Edit:

Also from MSDN:

Derived classes that override GetHashCode must also override Equals to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.

From MSDN's hashtable entry:

Key objects must be immutable as long as they are used as keys in the Hashtable.

The way I read this is that mutable objects should return different hashcodes as their values change, unless they are designed for use in a hashtable.

In the example of System.Drawing.Point, the object is mutable, and does return a different hashcode when the X or Y value changes. This would make it a poor candidate to be used as-is in a hashtable.

冷心人i 2024-07-19 07:59:39

我认为有关 GetHashcode 的文档有点令人困惑。

一方面,MSDN 指出对象的哈希码永远不应该改变,并且是恒定的
另一方面,MSDN 还指出,如果认为 2 个对象相等,则 GetHashcode 的返回值对于 2 个对象应该相等。

MSDN:

哈希函数必须具有以下属性:

  • 如果两个对象比较相等,则每个对象的 GetHashCode 方法
    必须返回相同的值。 然而,
    如果两个对象不比较
    等于,GetHashCode 方法
    两个对象不必返回
    不同的价值观。
  • 对象的 GetHashCode 方法必须始终返回
    只要没有相同的哈希码
    对对象状态的修改
    决定了的返回值
    对象的 Equals 方法。 请注意,这
    仅对于当前执行为 true
    一个应用程序,并且
    可以返回不同的哈希码,如果
    应用程序再次运行。
  • 为了获得最佳性能,哈希函数必须生成随机数
    所有输入的分布。

那么,这意味着所有对象都应该是不可变的,或者 GetHashcode 方法应该基于对象的不可变属性。
假设您有此类(简单实现):

public class SomeThing
{
      public string Name {get; set;}

      public override GetHashCode()
      {
          return Name.GetHashcode();
      }

      public override Equals(object other)
      {
           SomeThing = other as Something;
           if( other == null ) return false;
           return this.Name == other.Name;
      }
}

此实现已经违反了 MSDN 中可以找到的规则。
假设你有这个类的 2 个实例; instance1 的 Name 属性设置为“Pol”,instance2 的 Name 属性设置为“Piet”。
两个实例返回不同的哈希码,并且它们也不相等。
现在,假设我将instance2的Name更改为“Pol”,那么根据我的Equals方法,两个实例应该相等,并且根据MSDN的规则之一,它们应该返回相同的哈希码。
然而,这是不能做到的,因为instance2的hashcode会改变,而且MSDN声明这是不允许的。

然后,如果您有一个实体,您可以实现哈希码,以便它使用该实体的“主标识符”,理想情况下它可能是代理键或不可变属性。
如果您有一个值对象,则可以实现哈希码,以便它使用该值对象的“属性”。 这些属性构成了值对象的“定义”。 这当然是价值对象的本质; 您对它的身份不感兴趣,而是对它的价值感兴趣。
因此,值对象应该是不可变的。 (就像它们在 .NET 框架中一样,字符串、日期等都是不可变对象)。

我想到的另一件事是:
在此期间“会话”(我真的不知道应该如何称呼它)“GetHashCode”应该返回一个常量值。
假设您打开应用程序,从数据库(实体)加载对象的实例,并获取其哈希码。 它将返回一个特定的数字。
关闭应用程序,并加载相同的实体。 是否要求这次的hashcode与第一次加载实体时的值相同?
恕我直言,不是。

I think that the documentation regarding GetHashcode is a bit confusing.

On one hand, MSDN states that the hashcode of an object should never change , and be constant
On the other hand, MSDN also states that the return value of GetHashcode should be equal for 2 objects, if those 2 objects are considered to be equal.

MSDN:

A hash function must have the following properties:

  • If two objects compare as equal, the GetHashCode method for each object
    must return the same value. However,
    if two objects do not compare as
    equal, the GetHashCode methods for the
    two object do not have to return
    different values.
  • The GetHashCode method for an object must consistently return the
    same hash code as long as there is no
    modification to the object state that
    determines the return value of the
    object's Equals method. Note that this
    is true only for the current execution
    of an application, and that a
    different hash code can be returned if
    the application is run again.
  • For the best performance, a hash function must generate a random
    distribution for all input.

Then, this means that all your objects should be immutable, or the GetHashcode method should be based on properties of your object that are immutable.
Suppose for instance that you have this class (naive implementation):

public class SomeThing
{
      public string Name {get; set;}

      public override GetHashCode()
      {
          return Name.GetHashcode();
      }

      public override Equals(object other)
      {
           SomeThing = other as Something;
           if( other == null ) return false;
           return this.Name == other.Name;
      }
}

This implementation already violates the rules that can be found in MSDN.
Suppose you have 2 instances of this class; the Name property of instance1 is set to 'Pol', and the Name property of instance2 is set to 'Piet'.
Both instances return a different hashcode, and they're also not equal.
Now, suppose that I change the Name of instance2 to 'Pol', then, according to my Equals method, both instances should be equal, and according to one of the rules of MSDN, they should return the same hashcode.
However, this cannot be done, since the hashcode of instance2 will change, and MSDN states that this is not allowed.

Then, if you have an entity, you could maybe implement the hashcode so that it uses the 'primary identifier' of that entity, which is maybe ideally a surrogate key, or an immutable property.
If you have a value object, you can implement the Hashcode so that it uses the 'properties' of that value object. Those properties make up the 'definition' of the value object. This is of course the nature of a value object; you're not interested in it's identity, but rather in it's value.
And, therefore, value objects should be immutable. (Just like they are in the .NET framework, string, Date, etc... are all immutable objects).

Another thing that comes in mind:
During which 'session' (I don't know really how I should call this) should 'GetHashCode' return a constant value.
Suppose you open up your application, load an instance of an object out of the DB (an entity), and get its hashcode. It will return a certain number.
Close the application, and load the same entity. Is it required that the hashcode this time has the same value as when you loaded the entity the first time ?
IMHO, not.

白色秋天 2024-07-19 07:59:39

这是个好建议。 以下是布莱恩·佩平 (Brian Pepin) 对此事的看法:

这让我绊倒了不止一次
一次:确保 GetHashCode 始终
返回相同的值
实例的生命周期。 请记住
哈希码用于识别
大多数哈希表中的“桶”
实施。 如果一个物体的
“桶”发生变化,哈希表可能不会
能够找到你的对象。 这些可以
很难找到错误,所以得到它
第一次就对了。

This is good advice. Here's what Brian Pepin has to say on the matter:

This has tripped me up more than
once: Make sure GetHashCode always
returns the same value across the
lifetime of an instance. Remember that
hash codes are used to identify
"buckets" in most hashtable
implementations. If an object's
"bucket" changes, a hashtable may not
be able to find your object. These can
be very hard bugs to find, so get it
right the first time.

镜花水月 2024-07-19 07:59:39

不直接回答您的问题,但是 - 如果您使用 Resharper,请不要忘记它有一个功能可以为您生成合理的 GetHashCode 实现(以及 Equals 方法)。 当然,您可以指定在计算哈希码时将考虑类的哪些成员。

Not directly answering your question, but - if you use Resharper, do not forget it has a feature that generates a reasonable GetHashCode implementation (as well as the Equals method) for you. You can of course specify which members of the class will be taken into account when computing the hashcode.

无风消散 2024-07-19 07:59:39

查看 Marc Brooks 的这篇博文:

VTO, RTO 和 GetHashCode() -- 哦,天哪!

然后查看后续帖子(由于我是新人,无法链接,但在初始文章中有一个链接),其中进一步讨论并涵盖了一些小问题初步实施中的弱点。

这是我需要了解的有关创建 GetHashCode() 实现的所有信息,他甚至提供了他的方法以及其他一些实用程序的下载,简而言之就是黄金。

Check out this blog post from Marc Brooks:

VTOs, RTOs and GetHashCode() -- oh, my!

And then check out the follow up post (can't link as I'm new, but there's a link in the initlal article) which discusses further and covers some minor weaknesses in the initial implementation.

This was everything I needed to know about creating a GetHashCode() implementation, he even provides a download of his method along with some other utilities, in short gold.

护你周全 2024-07-19 07:59:39

哈希码永远不会改变,但了解哈希码的来源也很重要。

如果您的对象使用值语义,即对象的标识由其值定义(如字符串、颜色、所有结构)。 如果对象的身份独立于其所有值,则哈希码由其值的子集标识。 例如,您的 StackOverflow 条目存储在数据库中的某个位置。 如果您更改姓名或电子邮件,您的客户条目将保持不变,尽管某些值已更改(最终您通常通过一些长客户 ID 来识别)。

简而言之:

值类型语义 - 哈希码由值定义
引用类型语义 - 哈希码是由某个 id 定义的,

如果这仍然没有意义,我建议您阅读 Eric Evans 的领域驱动设计,其中他讨论了实体与值类型(这或多或少是我在上面尝试做的) 。

The hashcode never changes, but it's also important to understand where the Hashcode is coming from.

If your object is using value semantics, i.e. the object's identity is defined by its values (like String, Color, all structs). If your object's identity is independent of all of its values, then the Hashcode is identified by a subset of its values. For example, your StackOverflow entry is stored in a database somewhere. If you change your name or email, your customer entry stays the same, although some values have changed (ultimately you're usually identified by some long customer id #).

So in short:

Value type semantics - Hashcode is defined by values
Reference type semantics - Hashcode is defined by some id

I suggest you read Domain Driven Design by Eric Evans, where he goes into entities vs value types (which is more or less what I attempted to do above) if this still doesn't make sense.

姜生凉生 2024-07-19 07:59:39

查看 Eric Lippert 撰写的 GetHashCode 指南和规则

Check out Guidelines and rules for GetHashCode by Eric Lippert

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文