使用 GetHashCode 测试 Equals 覆盖中的相等性
是否可以调用 GetHashCode 作为从 Equals 覆盖内部测试相等性的方法?
例如,这个代码可以接受吗?
public class Class1
{
public string A
{
get;
set;
}
public string B
{
get;
set;
}
public override bool Equals(object obj)
{
Class1 other = obj as Class1;
return other != null && other.GetHashCode() == this.GetHashCode();
}
public override int GetHashCode()
{
int result = 0;
result = (result ^ 397) ^ (A == null ? 0 : A.GetHashCode());
result = (result ^ 397) ^ (B == null ? 0 : B.GetHashCode());
return result;
}
}
Is it ok to call GetHashCode as a method to test equality from inside the Equals override?
For example, is this code acceptable?
public class Class1
{
public string A
{
get;
set;
}
public string B
{
get;
set;
}
public override bool Equals(object obj)
{
Class1 other = obj as Class1;
return other != null && other.GetHashCode() == this.GetHashCode();
}
public override int GetHashCode()
{
int result = 0;
result = (result ^ 397) ^ (A == null ? 0 : A.GetHashCode());
result = (result ^ 397) ^ (B == null ? 0 : B.GetHashCode());
return result;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
其他人都是对的;你的平等操作被破坏了。为了说明这一点:
我想您希望该程序的输出为“假”,但根据您对相等的定义,在 CLR 的某些实现上它是“真”。
请记住,可能的哈希码大约只有四十亿种。可能的六字母字符串有超过四十亿种,因此其中至少有两个具有相同的哈希码。我已经给你们看了两个;还有无限多。
一般来说,您可以预期,如果有 n 个可能的哈希码,那么一旦您有 n 个元素的平方根,发生冲突的几率就会急剧上升。这就是所谓的“生日悖论”。有关为什么不应依赖哈希码来实现平等的文章,请单击 这里。
The others are right; your equality operation is broken. To illustrate:
I imagine you want the output of that program to be "false" but with your definition of equality it is "true" on some implementations of the CLR.
Remember, there are only about four billion possible hash codes. There are way more than four billion possible six letter strings, and therefore at least two of them have the same hash code. I've shown you two; there are infinitely many more.
In general you can expect that if there are n possible hash codes then the odds of getting a collision rise dramatically once you have about the square root of n elements in play. This is the so-called "birthday paradox". For my article on why you shouldn't rely upon hash codes for equality, click here.
不,这是不行的,因为它不
平等<=>哈希码相等
。只是
平等=>哈希码相等
。或者在另一个方向:
hashcode不等式=>不平等。
引用 http://msdn.microsoft.com/en-us/库/system.object.gethashcode.aspx:
No, it is not ok, because it's not
equality <=> hashcode equality
.It's just
equality => hashcode equality
.or in the other direction:
hashcode inequality => inequality
.Quoting http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx:
我想说,除非您希望
Equals
基本上意味着您的类型“具有相同的哈希代码”,否则 否,因为两个字符串可能不同,但共享相同的哈希码。概率可能很小,但也不是零。I would say, unless you want for
Equals
to basically mean "has the same hash code as" for your type, then no, because two strings may be different but share the same hash code. The probability may be small, but it isn't zero.不,这不是测试平等性的可接受方法。两个不相等的值很可能具有相同的哈希码。这将导致您的
Equals
实现在应该返回false
时返回true
No this is not an acceptable way to test for equality. It is very possible for 2 non-equal values to have the same hash code. This would cause your implementation of
Equals
to returntrue
when it should returnfalse
您可以调用
GetHashCode
来确定项目是否不相等,但如果两个对象返回相同的哈希码,并不意味着它们相等 > 平等。两个项目可以具有相同的哈希码,但不能相等。如果比较两个项目的成本很高,那么您可以比较哈希码。如果它们不平等,那么你可以保释。否则(哈希码相等),您必须进行完整比较。
例如:
You can call
GetHashCode
to determine if the items are not equal, but if two objects return the same hash code, that doesn't mean they are equal. Two items can have the same hash code but not be equal.If it's expensive to compare two items, then you can compare the hash codes. If they are unequal, then you can bail. Otherwise (the hash codes are equal), you have to do the full comparison.
For example:
你不能说仅仅因为哈希码相等,那么对象就一定相等。
您唯一一次在
Equals
内部调用GetHashCode
的情况是计算对象的哈希值(例如,因为您缓存了它)比检查对象的哈希值要便宜得多。平等。在这种情况下,您可以说if (this.GetHashCode() != other.GetHashCode()) return false;
这样您就可以快速验证对象是否不相等。那么你什么时候会这样做呢?我编写了一些代码,定期截取屏幕截图,并尝试找出屏幕更改以来已经过去了多长时间。由于我的屏幕截图大小为 8MB,并且在屏幕截图间隔内变化的像素相对较少,因此搜索它们的列表以查找哪些是相同的是相当昂贵的。哈希值很小,每个屏幕截图只需计算一次,因此很容易消除已知的不相等的哈希值。事实上,在我的应用程序中,我认为拥有相同的哈希值就足够接近相等,以至于我什至懒得去实现
Equals
重载,导致 C# 编译器警告我我正在重载 < code>GetHashCode 而不重载Equals
。You cannot say that just because the hash codes are equal then the objects must be equal.
The only time you would call
GetHashCode
inside ofEquals
was if it was much cheaper to compute a hash value for an object (say, because you cache it) than to check for equality. In that case you could sayif (this.GetHashCode() != other.GetHashCode()) return false;
so that you could quickly verify that the objects were not equal.So when would you ever do this? I wrote some code that takes screenshots at periodic intervals and tries to find how long it's been since the screen changed. Since my screenshots are 8MB and have relatively few pixels that change within the screenshot interval it's fairly expensive to search a list of them to find which ones are the same. A hash value is small and only has to be computed once per screenshot, making it easy to eliminate known non-equal ones. In fact, in my application I decided that having identical hashes was close enough to being equal that I didn't even bother to implement the
Equals
overload, causing the C# compiler to warn me that I was overloadingGetHashCode
without overloadingEquals
.在一种情况下,使用哈希码作为相等比较的快捷方式是有意义的。
考虑一下您正在构建哈希表或哈希集的情况。事实上,我们只考虑哈希集(哈希表通过还保存一个值来扩展它,但这并不相关)。
人们可以采取各种不同的方法,但在所有这些方法中,您都有少量可以放置哈希值的槽,我们采用开放或封闭的方法(这只是为了好玩,有些人使用相反的术语对于其他人);如果我们在两个不同对象的同一个槽上发生碰撞,我们可以将它们存储在同一个槽中(但有一个链接列表或类似的对象实际存储的位置),或者通过重新探测来选择不同的槽(有各种的策略)。
现在,无论采用哪种方法,我们都将从哈希表所需的 O(1) 复杂度转向 O(n) 复杂度。这种风险与可用槽的数量成反比,因此在达到一定大小后,我们调整哈希表的大小(即使一切都很理想,如果存储的项目数量大于存储项目的数量,我们最终也必须这样做)插槽)。
在调整大小时重新插入项目显然取决于哈希码。因此,虽然在对象中记忆
GetHashCode()
几乎没有意义(只是在大多数对象上调用得不够频繁),但在哈希表中记忆它确实有意义本身(或者也许是为了记住生成的结果,例如,如果您使用 Wang/Jenkins 哈希重新进行哈希处理,以减少不良GetHashCode()
实现造成的损害)。现在,当我们插入时,我们的逻辑将类似于:
因此,在这种情况下,我们必须在比较相等性之前获取哈希码。我们还预先计算了现有对象的哈希码,以允许调整大小。这两个事实的结合意味着将第 4 项的比较实现为有意义的:
显然,这样做的优点取决于
_cmp.Equals
的复杂性。如果我们的密钥类型是int
那么这完全是浪费。如果我们的键类型为 string 并且我们使用不区分大小写的 Unicode 规范化相等比较(因此它甚至不能缩短长度),那么节省的成本很值得。一般来说,记住哈希码没有意义,因为它们的使用频率不够高,不足以带来性能优势,但将它们存储在哈希集或哈希表本身中是有意义的。
There is one case where using hashcodes as a short-cut on equality comparisons makes sense.
Consider the case where you are building a hashtable or hashset. In fact, let's just consider hashsets (hashtables extend that by also holding a value, but that isn't relevant).
There are various different approaches one can take, but in all of them you have a small number of slots the hashed values can be placed in, and we take either the open or closed approach (which just for fun, some people use the opposite jargon for to others); if we collide on the same slot for two different objects we can either store them in the same slot (but having a linked list or such for where the objects are actually stored) or by re-probing to pick a different slot (there are various strategies for this).
Now, with either approach, we're moving away from the O(1) complexity we want with a hashtable, and towards an O(n) complexity. The risk of this is inversely proportional to the number of slots available, so after a certain size we resize the hashtable (even if everything was ideal, we'd eventually have to do this if the number of items stored were greater than the number of slots).
Re-inserting the items on a resize will obviously depend on the hash codes. Because of this, while it rarely makes sense to memoise
GetHashCode()
in an object (it just isn't called often enough on most objects), it certainly does make sense to memoise it within the hash table itself (or perhaps, to memoise a produced result, such as if you re-hashed with a Wang/Jenkins hash to reduce the damage caused by badGetHashCode()
implementations).Now, when we come to insert our logic is going to be something like:
So, in this case we have to get the hash code before we compare for equality. We also have the hash code for existing objects already pre-computed to allow for resize. The combination of these two facts means that it makes sense to implement our comparison for item 4 as:
Obviously, the advantage of this depends on the complexity of
_cmp.Equals
. If our key type wasint
then this would be a total waste. If our key type where string and we were using case-insensitive Unicode-normalised equality comparisons (so it can't even shortcut on length) then the saving could well be worth it.Generally memoising hash codes doesn't make sense because they aren't used often enough to be a performance win, but storing them in the hashset or hashtable itself can make sense.
这是错误的实现,正如其他人已经说明的原因。
您应该使用
GetHashCode
来短路相等性检查,例如:在
Equals
方法中仅当您确定随后的Equals实现比GetHashCode
昂贵得多时,这并不是绝大多数在您所展示的这一实现中(这是 99% 的情况)它不仅损坏,而且速度也慢得多。原因是什么? 计算属性的哈希值几乎肯定会比比较它们慢,因此您甚至没有获得性能方面的提升。实现正确的 GetHashCode 的优点是,您的类可以成为哈希表的键类型,其中哈希值仅计算一次(并且该值用于比较)。在您的情况下,如果
GetHashCode
在集合中,它将被多次调用。尽管 GetHashCode 本身应该很快,但它并不比等价的 Equals 快。要进行基准测试,请在此处运行
Equals
(正确的实现,取出当前基于哈希的实现)和GetHashCode
It's wrong implementation, as others have stated why.
You should short circuit the equality check using
GetHashCode
like:in the
Equals
method only if you're certain the ensuing Equals implementation is much more expensive thanGetHashCode
which is not vast majority of cases.In this one implementation you have shown (which is 99% of the cases) its not only broken, its also much slower. And the reason? Computing the hash of your properties would almost certainly be slower than comparing them, so you're not even gaining in performance terms. The advantage of implementing a proper
GetHashCode
is when your class can be the key type for hash tables where hash is computed only once (and that value is used for comparison). In your caseGetHashCode
will be called multiple times if it's in a collection. Even thoughGetHashCode
itself should be fast, it's not mostly faster than equivalentEquals
.To benchmark, run your
Equals
(a proper implementation, taking out the current hash based implementation) andGetHashCode
here