重写可变对象的 GetHashCode?

发布于 2024-07-20 18:30:36 字数 1727 浏览 4 评论 0原文

我已经阅读了大约 10 个关于何时以及如何重写 GetHashCode 的不同问题,但仍有一些问题我不太明白。 GetHashCode 的大多数实现都基于对象字段的哈希码,但有人指出 GetHashCode 的值在对象的生命周期内永远不应该改变。 如果它所基于的字段是可变的,那么它如何工作? 另外,如果我确实希望字典查找等基于引用相等而不是我覆盖的 Equals 该怎么办?

我主要是为了便于对序列化代码进行单元测试而重写 Equals ,我假设序列化和反序列化(在我的例子中为 XML)会杀死引用相等性,所以我想确保至少它是正确的价值平等。 在这种情况下重写Equals是不好的做法吗? 基本上,在大多数执行代码中,我想要引用相等,并且我总是使用 == 并且我不会覆盖它。 我应该创建一个新方法 ValueEquals 或其他方法而不是覆盖 Equals 吗? 我曾经假设框架总是使用 == 而不是 Equals 来比较事物,所以我认为重写 Equals 是安全的,因为它在我看来,它的目的是为了如果你想要有一个不同于 == 运算符的第二个相等定义。 但从阅读其他几个问题来看,情况似乎并非如此。

编辑:

看来我的意图不清楚,我的意思是99%的时间我想要普通的旧引用相等,默认行为,没有惊喜。 对于极少数情况,我希望值相等,并且我想通过使用 .Equals 而不是 == 显式请求值相等。

当我这样做时,编译器建议我也重写 GetHashCode ,这就是这个问题的出现。 当应用于可变对象时,GetHashCode 似乎存在矛盾的目标,这些目标是:

  1. If a.Equals(b) then a.GetHashCode() 应该 == b.GetHashCode()
  2. a.GetHashCode() 的值在 a 的生命周期内不应改变。

当对象是可变对象时,这些看起来自然是矛盾的,因为如果对象的状态发生变化,我们期望 .Equals() 的值发生变化,这意味着 GetHashCode 应该发生变化以匹配 .Equals() 中的更改,但 GetHashCode 不应更改。

为什么会出现这样的矛盾呢? 这些建议是否不适用于可变对象? 可能是假设的,但可能值得一提的是,我指的是类而不是结构。

解决方案:

我将 JaredPar 标记为已接受,但主要用于评论交互。 总结一下我从中学到的东西是,实现所有目标并避免边缘情况下可能出现的奇怪行为的唯一方法是仅重写基于 EqualsGetHashCode 的方法。在不可变字段上,或实现 IEquatable。 这种类型似乎削弱了引用类型重写 Equals 的用处,因为据我所知,大多数引用类型通常没有不可变字段,除非它们存储在关系数据库中以标识它们他们的主键。

I've read about 10 different questions on when and how to override GetHashCode but there's still something I don't quite get. Most implementations of GetHashCode are based on the hash codes of the fields of the object, but it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable? Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case? Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that. Should I just create a new method ValueEquals or something instead of overriding Equals? I used to assume that the framework always uses == and not Equals to compare things and so I thought it was safe to override Equals since it seemed to me like its purpose was for if you want to have a 2nd definition of equality that's different from the == operator. From reading several other questions though it seems that's not the case.

EDIT:

It seems my intentions were unclear, what I mean is that 99% of the time I want plain old reference equality, default behavior, no surprises. For very rare cases I want to have value equality, and I want to explicitly request value equality by using .Equals instead of ==.

When I do this the compiler recommends I override GetHashCode as well, and that's how this question came up. It seemed like there's contradicting goals for GetHashCode when applied to mutable objects, those being:

  1. If a.Equals(b) then a.GetHashCode() should == b.GetHashCode().
  2. The value of a.GetHashCode() should never change for the lifetime of a.

These seem naturally contradicting when a mutable object, because if the state of the object changes, we expect the value of .Equals() to change, which means that GetHashCode should change to match the change in .Equals(), but GetHashCode should not change.

Why does there seem to be this contradiction? Are these recommendations not meant to apply to mutable objects? Probably assumed, but might be worth mentioning I'm referring to classes not structs.

Resolution:

I'm marking JaredPar as accepted, but mainly for the comments interaction. To sum up what I've learned from this is that the only way to achieve all goals and to avoid possible quirky behavior in edge cases is to only override Equals and GetHashCode based on immutable fields, or implement IEquatable. This kind of seems to diminish the usefulness of overriding Equals for reference types, as from what I've seen most reference types usually have no immutable fields unless they're stored in a relational database to identify them with their primary keys.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

ゞ记忆︶ㄣ 2024-07-27 18:30:36

如果它所基于的字段是可变的,那么它如何工作?

这并不意味着哈希码会随着对象的变化而变化。 由于您阅读的文章中列出的所有原因,这是一个问题。 不幸的是,这种问题通常只出现在极端情况下。 因此,开发人员往往会逃避不良行为。

另外,如果我确实希望字典查找等基于引用相等而不是我覆盖的等于,该怎么办?

只要您实现像 IEquatable 这样的接口,这就不成问题。 大多数字典实现都会选择使用 IEquatable而不是 Object.ReferenceEquals 的方式来选择相等比较器。 即使没有 IEquatable,大多数都会默认调用 Object.Equals(),然后它将进入您的实现。

基本上,在大多数执行代码中,我想要引用相等,并且我总是使用 == 并且我不会覆盖它。

如果您希望对象的行为具有值相等性,则应覆盖 == 和 != 以强制所有比较的值相等。 如果用户确实想要引用相等,他们仍然可以使用 Object.ReferenceEquals。

我曾经假设框架总是使用 == 而不是 Equals 来比较事物

BCL 使用的内容随着时间的推移发生了一些变化。 现在,大多数使用相等性的情况都会采用 IEqualityComparer 实例并将其用于相等性。 在未指定的情况下,他们将使用 EqualityComparer.Default 来查找。 在最坏的情况下,这将默认调用 Object.Equals

How does that work if the fields that it's based on are mutable?

It doesn't in the sense that the hash code will change as the object changes. That is a problem for all of the reasons listed in the articles you read. Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior.

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

As long as you implement an interface like IEquatable<T> this shouldn't be a problem. Most dictionary implementations will choose an equality comparer in a way that will use IEquatable<T> over Object.ReferenceEquals. Even without IEquatable<T>, most will default to calling Object.Equals() which will then go into your implementation.

Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that.

If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality.

I used to assume that the framework always uses == and not Equals to compare things

What the BCL uses has changed a bit over time. Now most cases which use equality will take an IEqualityComparer<T> instance and use it for equality. In the cases where one is not specified they will use EqualityComparer<T>.Default to find one. At worst case this will default to calling Object.Equals

夏末的微笑 2024-07-27 18:30:36

如果您有一个可变对象,则重写 GetHashCode 方法没有多大意义,因为您无法真正使用它。 例如,DictionaryHashSet 集合使用它来将每个项目放入存储桶中。 如果您在将对象用作集合中的键时对其进行更改,则哈希码将不再与该对象所在的存储桶匹配,因此集合将无法正常工作,并且您可能永远无法再次找到该对象。

如果您希望查找不使用类的 GetHashCodeEquals 方法,您始终可以提供自己的 IEqualityComparer 实现,以便在以下情况下使用:您创建字典

Equals 方法旨在实现值相等,因此以这种方式实现它并没有错。

If you have a mutable object, there isn't much point in overriding the GetHashCode method, as you can't really use it. It's used for example by the Dictionary and HashSet collections to place each item in a bucket. If you change the object while it's used as a key in the collection, the hash code no longer matches the bucket that the object is in, so the collection doesn't work properly and you may never find the object again.

If you want the lookup not to use the GetHashCode or Equals method of the class, you can always provide your own IEqualityComparer implementation to use instead when you create the Dictionary.

The Equals method is intended for value equality, so it's not wrong to implement it that way.

最单纯的乌龟 2024-07-27 18:30:36

哇,这实际上是几个问题合而为一:-)。 于是一前一后:

有人指出,GetHashCode 的值在对象的生命周期内不应改变。 如果它所基于的字段是可变的,那么它如何工作?

这个常见的建议适用于您想要将对象用作哈希表/字典等中的键的情况。 哈希表通常要求哈希值不改变,因为它们用它来决定如何存储和存储哈希值。 取回钥匙。 如果哈希发生变化,哈希表可能将不再找到您的对象。

引用 Java 的 Map 接口的文档:

注意:如果使用可变对象作为映射键,则必须非常小心。 如果对象的值以影响等于比较的方式更改,而该对象是映射中的键,则未指定映射的行为。

一般来说,使用any< /em> 一种可变对象作为哈希表中的键:甚至不清楚如果键在添加到哈希表后发生更改会发生什么。 哈希表应该通过旧键、新键还是两者都返回存储的对象?

所以真正的建议是:仅使用不可变对象作为键,并确保它们的哈希码也永远不会改变(如果对象是不可变的,这通常是自动的)。

如果我确实希望字典查找等基于引用相等而不是我覆盖的等于怎么办?

好吧,找到一个像这样工作的字典实现。 但是标准库字典使用 hashcode&Equals,并且没有办法改变它。

我主要是为了方便对我的序列化代码进行单元测试而重写 Equals,我假设序列化和反序列化(在我的例子中为 XML)会杀死引用相等性,所以我想确保至少它的值相等性是正确的。 在这种情况下重写 Equals 是不好的做法吗?

不,我觉得这是完全可以接受的。 但是,您不应该使用此类对象作为字典/哈希表中的键,因为它们是可变的。 往上看。

Wow, that's actually several questions in one :-). So one after the other:

it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable?

This common advice is meant for the case where you want to use your object as a key in a HashTable/dictionary etc. . HashTables usually require the hash not to change, because they use it to decide how to store & retrieve the key. If the hash changes, the HashTable will probably no longer find your object.

To cite the docs of Java's Map interface:

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.

In general it's a bad idea to use any kind of mutable object as a key in a hash table: It's not even clear what should happen if a key changes after it's been added to the hash table. Should the hash table return the stored object via the old key, or via the new key, or via both?

So the real advice is: Only use immutable objects as keys, and make sure their hashcode never changes either (which is usually automatic if the object is immutable).

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

Well, find a dictionary implementation that works like that. But the standard library dictionaries use the hashcode&Equals, and there's no way to change that.

I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case?

No, I'd find that perfectly acceptable. However, you should not use such objects as keys in a dictionary/hashtable, as they're mutable. See above.

痞味浪人 2024-07-27 18:30:36

我不了解 C#,对它来说是个相对菜鸟,但在 Java 中,如果你重写 equals(),你还需要重写 hashCode() 来维护它们之间的契约(反之亦然)...而 java也有同样的陷阱22; 基本上强迫你使用不可变字段...但这只是对于用作散列键的类的问题,并且 Java 对所有基于散列的集合都有替代实现...可能不那么快,但它们确实有效允许你使用可变对象作为键......它只是(通常)被认为是“糟糕的设计”。

我很想指出,这个根本问题是永恒的……从亚当还是个小伙子的时候起,这个问题就一直存在。

我曾经研究过比我年龄大的 Fortran 代码(我 36 岁),当用户名更改时(例如女孩结婚或离婚时;-),该代码会中断......因此是工程,采用的解决方案是:GetHashCode“方法”会记住先前计算的 hashCode,重新计算 hashCode(即虚拟 isDirty 标记),如果键字段已更改,则返回 null。 这会导致缓存删除“脏”用户(通过调用另一个 GetPreviousHashCode),然后缓存返回 null,导致用户重新从数据库中读取。 一个有趣且有价值的黑客; 即使我自己也这么说;-)

我会权衡可变性(仅在极端情况下才需要)来换取 O(1) 访问(在所有情况下都需要)。 欢迎来到工程界; 知情妥协之地。

干杯。 基思.

I don't know about C#, being a relative noob to it, but in Java, if you override equals() you need to also override hashCode() to maintain the contract between them (and vice-versa)... And java also has the same catch 22; basically forcing you use immutable fields... But this is an issue only for classes which are used as a hash-key, and Java has alternate implementations for all hash-based collections... which maybe not as fast, but they do effecitely allow you to use a mutable object as a key... it's just (usually) frowned up as a "poor design".

And I feel the urge to point out that this fundamental problem is timeless... It's been around since Adam was a lad.

I've worked on fortran code which is older than I am (I'm 36) which breaks when a username is changed (like when a girl gets married, or divorced ;-) ... Thus is engineering, The adopted solution was: The GetHashCode "method" remembers the previously calculated hashCode, recalculates the hashCode (i.e. a virtual isDirty marker) and if the keyfields have changed it returns null. This causes the cache to delete the "dirty" user (by calling another GetPreviousHashCode) and then the cache returns null, causing the user to re-read from the database. An interesting and worthwhile hack; even if I do say so myself ;-)

I'll trade-off mutability (only desirable in corner cases) for O(1) access (desirable in all cases). Welcome to engineering; the land of the informed compromise.

Cheers. Keith.

岛徒 2024-07-27 18:30:36

这里的基本主题是如何最好地唯一地识别对象。 您提到序列化/反序列化这很重要,因为在此过程中会丢失引用完整性。

简而言之,对象应该由可用于执行此操作的最小不可变字段集唯一标识。 这些是您在重写 GetHashCode 和 Equals 时应使用的字段。

对于测试来说,定义您需要的任何断言是完全合理的,通常这些断言不是在类型本身上定义的,而是作为测试套件中的实用方法定义的。 也许是 TestSuite.AssertEquals(MyClass, MyClass) ?

请注意,GetHashCode 和 Equals 应该一起工作。 如果两个对象相等,则 GetHashCode 应返回相同的值。 当且仅当两个对象具有相同的哈希码时,Equals 才应返回 true。 (请注意,两个对象可能不相等,但可能返回相同的哈希码)。 有很多网页直接解决这个主题,只需谷歌一下即可。

The underlying topic here is how to best uniquely identify objects. You mention serialization/deserialization which is important because referential integrity is lost in that process.

The short answer, Is that objects should be uniquely identified by the smallest set of immutable fields that can be used to do so. These are the fields you should use when overrideing GetHashCode and Equals.

For testing it's perfectly reasonable to define whatever assertions you need, usually these are not defined on the type itself but rather as utility methods in the test suite. Maybe a TestSuite.AssertEquals(MyClass, MyClass) ?

Note that GetHashCode and Equals should work together. GetHashCode should return the same value for two objects if they are equal. Equals should return true if and only if two objects have the same hash code. (Note that it's possible that two object may not be equal but may return the same hash code). There are plenty of webpage that tackle this topic head-on, just google away.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文