使用 GetHashCode 测试 Equals 覆盖中的相等性

发布于 2024-10-04 04:22:46 字数 589 浏览 8 评论 0原文

是否可以调用 GetHashCode 作为从 Equals 覆盖内部测试相等性的方法?

例如,这个代码可以接受吗?

public class Class1
{
  public string A
  {
    get;
    set;
  }

  public string B
  {
    get;
    set;
  }

  public override bool Equals(object obj)
  {
    Class1 other = obj as Class1;
    return other != null && other.GetHashCode() == this.GetHashCode();
  }

  public override int GetHashCode()
  {
    int result = 0;
    result = (result ^ 397) ^ (A == null ? 0 : A.GetHashCode());
    result = (result ^ 397) ^ (B == null ? 0 : B.GetHashCode());
    return result;
  }
}

Is it ok to call GetHashCode as a method to test equality from inside the Equals override?

For example, is this code acceptable?

public class Class1
{
  public string A
  {
    get;
    set;
  }

  public string B
  {
    get;
    set;
  }

  public override bool Equals(object obj)
  {
    Class1 other = obj as Class1;
    return other != null && other.GetHashCode() == this.GetHashCode();
  }

  public override int GetHashCode()
  {
    int result = 0;
    result = (result ^ 397) ^ (A == null ? 0 : A.GetHashCode());
    result = (result ^ 397) ^ (B == null ? 0 : B.GetHashCode());
    return result;
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

夜声 2024-10-11 04:22:46

其他人都是对的;你的平等操作被破坏了。为了说明这一点:

public static void Main()
{
    var c1 = new Class1() { A = "apahaa", B = null };
    var c2 = new Class1() { A = "abacaz", B = null };
    Console.WriteLine(c1.Equals(c2));
}

我想您希望该程序的输出为“假”,但根据您对相等的定义,在 CLR 的某些实现上它是“真”。

请记住,可能的哈希码大约只有四十亿种。可能的六字母字符串有超过四十亿种,因此其中至少有两个具有相同的哈希码。我已经给你们看了两个;还有无限多。

一般来说,您可以预期,如果有 n 个可能的哈希码,那么一旦您有 n 个元素的平方根,发生冲突的几率就会急剧上升。这就是所谓的“生日悖论”。有关为什么不应依赖哈希码来实现平等的文章,请单击 这里

The others are right; your equality operation is broken. To illustrate:

public static void Main()
{
    var c1 = new Class1() { A = "apahaa", B = null };
    var c2 = new Class1() { A = "abacaz", B = null };
    Console.WriteLine(c1.Equals(c2));
}

I imagine you want the output of that program to be "false" but with your definition of equality it is "true" on some implementations of the CLR.

Remember, there are only about four billion possible hash codes. There are way more than four billion possible six letter strings, and therefore at least two of them have the same hash code. I've shown you two; there are infinitely many more.

In general you can expect that if there are n possible hash codes then the odds of getting a collision rise dramatically once you have about the square root of n elements in play. This is the so-called "birthday paradox". For my article on why you shouldn't rely upon hash codes for equality, click here.

独孤求败 2024-10-11 04:22:46

不,这是不行的,因为它

平等<=>哈希码相等

只是

平等=>哈希码相等

或者在另一个方向:

hashcode不等式=>不平等。

引用 http://msdn.microsoft.com/en-us/库/system.object.gethashcode.aspx

<块引用>

如果两个对象比较相等,则每个对象的 GetHashCode 方法必须返回相同的值。但是,如果两个对象比较不相等,则两个对象的 GetHashCode 方法不必返回不同的值。

No, it is not ok, because it's not

equality <=> hashcode equality.

It's just

equality => hashcode equality.

or in the other direction:

hashcode inequality => inequality.

Quoting http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

复古式 2024-10-11 04:22:46

我想说,除非您希望 Equals 基本上意味着您的类型“具有相同的哈希代码”,否则 ,因为两个字符串可能不同,但共享相同的哈希码。概率可能很小,但也不是零。

I would say, unless you want for Equals to basically mean "has the same hash code as" for your type, then no, because two strings may be different but share the same hash code. The probability may be small, but it isn't zero.

愿得七秒忆 2024-10-11 04:22:46

不,这不是测试平等性的可接受方法。两个不相等的值很可能具有相同的哈希码。这将导致您的 Equals 实现在应该返回 false 时返回 true

No this is not an acceptable way to test for equality. It is very possible for 2 non-equal values to have the same hash code. This would cause your implementation of Equals to return true when it should return false

兔姬 2024-10-11 04:22:46

您可以调用 GetHashCode 来确定项目是否相等,但如果两个对象返回相同的哈希码,并不意味着它们相等 > 平等。两个项目可以具有相同的哈希码,但不能相等。

如果比较两个项目的成本很高,那么您可以比较哈希码。如果它们不平等,那么你可以保释。否则(哈希码相等),您必须进行完整比较。

例如:

public override bool Equals(object obj)
  {
    Class1 other = obj as Class1;
    if (other == null || other.GetHashCode() != this.GetHashCode())
        return false;
    // the hash codes are the same so you have to do a full object compare.
  }

You can call GetHashCode to determine if the items are not equal, but if two objects return the same hash code, that doesn't mean they are equal. Two items can have the same hash code but not be equal.

If it's expensive to compare two items, then you can compare the hash codes. If they are unequal, then you can bail. Otherwise (the hash codes are equal), you have to do the full comparison.

For example:

public override bool Equals(object obj)
  {
    Class1 other = obj as Class1;
    if (other == null || other.GetHashCode() != this.GetHashCode())
        return false;
    // the hash codes are the same so you have to do a full object compare.
  }
幸福还没到 2024-10-11 04:22:46

不能说仅仅因为哈希码相等,那么对象就一定相等。

您唯一一次在 Equals 内部调用 GetHashCode 的情况是计算对象的哈希值(例如,因为您缓存了它)比检查对象的哈希值要便宜得多。平等。在这种情况下,您可以说 if (this.GetHashCode() != other.GetHashCode()) return false; 这样您就可以快速验证对象是否不相等。

那么你什么时候会这样做呢?我编写了一些代码,定期截取屏幕截图,并尝试找出屏幕更改以来已经过去了多长时间。由于我的屏幕截图大小为 8MB,并且在屏幕截图间隔内变化的像素相对较少,因此搜索它们的列表以查找哪些是相同的是相当昂贵的。哈希值很小,每个屏幕截图只需计算一次,因此很容易消除已知的不相等的哈希值。事实上,在我的应用程序中,我认为拥有相同的哈希值就足够接近相等,以至于我什至懒得去实现 Equals 重载,导致 C# 编译器警告我我正在重载 < code>GetHashCode 而不重载 Equals

You cannot say that just because the hash codes are equal then the objects must be equal.

The only time you would call GetHashCode inside of Equals was if it was much cheaper to compute a hash value for an object (say, because you cache it) than to check for equality. In that case you could say if (this.GetHashCode() != other.GetHashCode()) return false; so that you could quickly verify that the objects were not equal.

So when would you ever do this? I wrote some code that takes screenshots at periodic intervals and tries to find how long it's been since the screen changed. Since my screenshots are 8MB and have relatively few pixels that change within the screenshot interval it's fairly expensive to search a list of them to find which ones are the same. A hash value is small and only has to be computed once per screenshot, making it easy to eliminate known non-equal ones. In fact, in my application I decided that having identical hashes was close enough to being equal that I didn't even bother to implement the Equals overload, causing the C# compiler to warn me that I was overloading GetHashCode without overloading Equals.

数理化全能战士 2024-10-11 04:22:46

在一种情况下,使用哈希码作为相等比较的快捷方式是有意义的。

考虑一下您正在构建哈希表或哈希集的情况。事实上,我们只考虑哈希集(哈希表通过还保存一个值来扩展它,但这并不相关)。

人们可以采取各种不同的方法,但在所有这些方法中,您都有少量可以放置哈希值的槽,我们采用开放或封闭的方法(这只是为了好玩,有些人使用相反的术语对于其他人);如果我们在两个不同对象的同一个槽上发生碰撞,我们可以将它们存储在同一个槽中(但有一个链接列表或类似的对象实际存储的位置),或者通过重新探测来选择不同的槽(有各种的策略)。

现在,无论采用哪种方法,我们都将从哈希表所需的 O(1) 复杂度转向 O(n) 复杂度。这种风险与可用槽的数量成反比,因此在达到一定大小后,我们调整哈希表的大小(即使一切都很理想,如果存储的项目数量大于存储项目的数量,我们最终也必须这样做)插槽)。

在调整大小时重新插入项目显然取决于哈希码。因此,虽然在对象中记忆 GetHashCode() 几乎没有意义(只是在大多数对象上调用得不够频繁),但在哈希表中记忆它确实有意义本身(或者也许是为了记住生成的结果,例如,如果您使用 Wang/Jenkins 哈希重新进行哈希处理,以减少不良 GetHashCode() 实现造成的损害)。

现在,当我们插入时,我们的逻辑将类似于:

  1. 获取对象的哈希码。
  2. 获取对象的槽。
  3. 如果槽为空,则将对象放入其中并返回。
  4. 如果槽包含相等的对象,那么我们就完成了哈希集,并且有位置来替换哈希表的值。这样做,然后返回。
  5. 根据碰撞策略尝试下一个槽,然后返回到第 3 项(如果我们循环得太频繁,可能会调整大小)。

因此,在这种情况下,我们必须在比较相等性之前获取哈希码。我们还预先计算了现有对象的哈希码,以允许调整大小。这两个事实的结合意味着将第 4 项的比较实现为有意义的:

private bool IsMatch(KeyType newItem, KeyType storedItem, int newHash, int oldHash)
{
  return ReferenceEquals(newItem, storedItem) // fast, false negatives, no false positives (only applicable to reference types)
    ||
    (
      newHash == oldHash // fast, false positives, no fast negatives
      &&
      _cmp.Equals(newItem, storedItem) // slow for some types, but always correct result.
    );
}

显然,这样做的优点取决于 _cmp.Equals 的复杂性。如果我们的密钥类型是 int 那么这完全是浪费。如果我们的键类型为 string 并且我们使用不区分大小写的 Unicode 规范化相等比较(因此它甚至不能缩短长度),那么节省的成本很值得。

一般来说,记住哈希码没有意义,因为它们的使用频率不够高,不足以带来性能优势,但将它们存储在哈希集或哈希表本身中是有意义的。

There is one case where using hashcodes as a short-cut on equality comparisons makes sense.

Consider the case where you are building a hashtable or hashset. In fact, let's just consider hashsets (hashtables extend that by also holding a value, but that isn't relevant).

There are various different approaches one can take, but in all of them you have a small number of slots the hashed values can be placed in, and we take either the open or closed approach (which just for fun, some people use the opposite jargon for to others); if we collide on the same slot for two different objects we can either store them in the same slot (but having a linked list or such for where the objects are actually stored) or by re-probing to pick a different slot (there are various strategies for this).

Now, with either approach, we're moving away from the O(1) complexity we want with a hashtable, and towards an O(n) complexity. The risk of this is inversely proportional to the number of slots available, so after a certain size we resize the hashtable (even if everything was ideal, we'd eventually have to do this if the number of items stored were greater than the number of slots).

Re-inserting the items on a resize will obviously depend on the hash codes. Because of this, while it rarely makes sense to memoise GetHashCode() in an object (it just isn't called often enough on most objects), it certainly does make sense to memoise it within the hash table itself (or perhaps, to memoise a produced result, such as if you re-hashed with a Wang/Jenkins hash to reduce the damage caused by bad GetHashCode() implementations).

Now, when we come to insert our logic is going to be something like:

  1. Get hash code for object.
  2. Get slot for object.
  3. If slot is empty, place object in it and return.
  4. If slot contains equal object, we're done for a hashset and have the position to replace the value for a hashtable. Do this, and return.
  5. Try next slot according to collision strategy, and return to item 3 (perhaps resizing if we loop this too often).

So, in this case we have to get the hash code before we compare for equality. We also have the hash code for existing objects already pre-computed to allow for resize. The combination of these two facts means that it makes sense to implement our comparison for item 4 as:

private bool IsMatch(KeyType newItem, KeyType storedItem, int newHash, int oldHash)
{
  return ReferenceEquals(newItem, storedItem) // fast, false negatives, no false positives (only applicable to reference types)
    ||
    (
      newHash == oldHash // fast, false positives, no fast negatives
      &&
      _cmp.Equals(newItem, storedItem) // slow for some types, but always correct result.
    );
}

Obviously, the advantage of this depends on the complexity of _cmp.Equals. If our key type was int then this would be a total waste. If our key type where string and we were using case-insensitive Unicode-normalised equality comparisons (so it can't even shortcut on length) then the saving could well be worth it.

Generally memoising hash codes doesn't make sense because they aren't used often enough to be a performance win, but storing them in the hashset or hashtable itself can make sense.

赢得她心 2024-10-11 04:22:46
  1. 这是错误的实现,正如其他人已经说明的原因。

  2. 您应该使用 GetHashCode 来短路相等性检查,例如:

    if (other.GetHashCode() != this.GetHashCode()
        返回假;
    

    Equals方法中仅当您确定随后的Equals实现比GetHashCode昂贵得多时,这并不是绝大多数

  3. 在您所展示的这一实现中(这是 99% 的情况)它不仅损坏,而且速度也慢得多。原因是什么? 计算属性的哈希值几乎肯定会比比较它们慢,因此您甚至没有获得性能方面的提升。实现正确的 GetHashCode 的优点是,您的类可以成为哈希表的键类型,其中哈希值仅计算一次(并且该值用于比较)。在您的情况下,如果 GetHashCode 在集合中,它将被多次调用。尽管 GetHashCode 本身应该很快,但它并不比等价的 Equals 快。

    要进行基准测试,请在此处运行 Equals(正确的实现,取出当前基于哈希的实现)和 GetHashCode

    var watch = Stopwatch.StartNew();
    for (int i = 0; i < 100000; i++) 
    {
        行动(); //这里调用Equals和GetHashCode来测试性能。
    }
    手表.Stop();
    Console.WriteLine(watch.Elapsed.TotalMilliseconds);
    
  1. It's wrong implementation, as others have stated why.

  2. You should short circuit the equality check using GetHashCode like:

    if (other.GetHashCode() != this.GetHashCode()
        return false;
    

    in the Equals method only if you're certain the ensuing Equals implementation is much more expensive than GetHashCode which is not vast majority of cases.

  3. In this one implementation you have shown (which is 99% of the cases) its not only broken, its also much slower. And the reason? Computing the hash of your properties would almost certainly be slower than comparing them, so you're not even gaining in performance terms. The advantage of implementing a proper GetHashCode is when your class can be the key type for hash tables where hash is computed only once (and that value is used for comparison). In your case GetHashCode will be called multiple times if it's in a collection. Even though GetHashCode itself should be fast, it's not mostly faster than equivalent Equals.

    To benchmark, run your Equals (a proper implementation, taking out the current hash based implementation) and GetHashCode here

    var watch = Stopwatch.StartNew();
    for (int i = 0; i < 100000; i++) 
    {
        action(); //Equals and GetHashCode called here to test for performance.
    }
    watch.Stop();
    Console.WriteLine(watch.Elapsed.TotalMilliseconds);
    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文