当前位置：文江博客话题详情

C# hashcode overriding

为什么在重写 Equals 方法时重写 GetHashCode 很重要？

发布于 2024-07-11 03:32:15 字数 679 浏览 7 评论 0 原文

给定以下类，

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null) 
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Which is preferred?

        return base.GetHashCode();

        //return this.FooId.GetHashCode();
    }
}

我重写了 Equals 方法，因为 Foo 代表 Foo 表的一行。哪种方法是覆盖 GetHashCode 的首选方法？

为什么覆盖 GetHashCode 很重要？

原文

Given the following class

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null) 
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Which is preferred?

        return base.GetHashCode();

        //return this.FooId.GetHashCode();
    }
}

I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?

Why is it important to override GetHashCode?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七分※倦醒 2024-07-18 03:32:16

从 .NET 4.7 开始，重写 GetHashCode() 的首选方法如下所示。如果面向较旧的 .NET 版本，请包含 System.ValueTuple nuget 包。

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

就性能而言，此方法将优于大多数复合哈希代码实现。 ValueTuple 是一个 struct 因此不会有任何垃圾，并且底层算法尽可能快。

As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.

回复收藏 0 原文

孤独陪着我 2024-07-18 03:32:16

怎么样：

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

假设性能不是问题:)

How about:

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

Assuming performance is not an issue :)

回复收藏 0 原文

つ可否回来 2024-07-18 03:32:16

重写 Equals() 时，请不要忘记检查 obj 参数是否为 null。
还要比较类型。

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

原因是：Equals 与 null 比较时必须返回 false。另请参阅 http://msdn.microsoft.com/en-us/library/bsc2ak47 .aspx

Please don´t forget to check the obj parameter against null when overriding Equals().
And also compare the type.

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx

回复收藏 0 原文

请叫√我孤独 2024-07-18 03:32:16

只是添加上面的答案：

如果您不覆盖 Equals，则默认行为是比较对象的引用。这同样适用于哈希码 - 默认实现通常基于引用的内存地址。
因为您确实覆盖了 Equals，这意味着正确的行为是比较您在 Equals 上实现的任何内容，而不是引用，因此您应该对哈希码执行相同的操作。

您的类的客户端将期望哈希码具有与 equals 方法类似的逻辑，例如使用 IEqualityComparer 的 linq 方法首先比较哈希码，只有当它们相等时，才会比较 Equals() 方法，这可能会更昂贵运行时，如果我们没有实现 hashcode，相等的对象可能会有不同的 hashcode（因为它们有不同的内存地址），并且会被错误地确定为不相等（Equals() 甚至不会命中）。

此外，除了以下问题：如果您在字典中使用它，您可能无法找到您的对象（因为它是由一个哈希码插入的，当您查找它时，默认哈希码可能会有所不同，并且再次使用 Equals()甚至不会被调用，就像 Marc Gravell 在他的回答中解释的那样，您还引入了对字典或哈希集概念的违反，该概念不应允许相同的键 -
当您覆盖 Equals 时，您已经声明这些对象本质上是相同的，因此您不希望它们都作为数据结构上假设具有唯一键的不同键。但由于它们具有不同的哈希码，因此“相同”的密钥将作为不同的密钥插入。

回复收藏 0 原文

翻身的咸鱼 2024-07-18 03:32:16

因为框架要求两个相同的对象必须有相同的hashcode。如果重写 equals 方法对两个对象进行特殊比较，并且该方法认为两个对象相同，那么两个对象的哈希码也必须相同。（字典和哈希表依赖于这个原则）。

回复收藏 0 原文

无所的.畏惧 2024-07-18 03:32:16

我们有两个问题需要解决。

如果以下字段中存在任何字段，则您无法提供合理的 GetHashCode()
对象可以改变。通常，一个对象永远不会被用于
依赖于 GetHashCode() 的集合。所以成本
实现 GetHashCode() 通常不值得，或者根本不值得
可能。
如果有人将您的对象放入调用的集合中
GetHashCode() 并且您已重写 Equals() 而无需进行
GetHashCode() 以正确的方式行事，该人可能会花费数天时间
跟踪问题。

因此默认情况下我会这样做。

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}

We have two problems to cope with.

You cannot provide a sensible GetHashCode() if any field in the
object can be changed. Also often a object will NEVER be used in a
collection that depends on GetHashCode(). So the cost of
implementing GetHashCode() is often not worth it, or it is not
possible.
If someone puts your object in a collection that calls
GetHashCode() and you have overrided Equals() without also making
GetHashCode() behave in a correct way, that person may spend days
tracking down the problem.

Therefore by default I do.

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}

回复收藏 0 原文

述情 2024-07-18 03:32:16

哈希码用于基于哈希的集合，如 Dictionary、Hashtable、HashSet 等。此代码的目的是通过将特定对象放入特定组（桶）来非常快速地对其进行预排序。当您需要从哈希集合检索该对象时，这种预排序非常有助于查找该对象，因为代码必须仅在一个存储桶中搜索您的对象，而不是在它包含的所有对象中搜索。哈希码的分布越好（唯一性越好），检索速度越快。在理想情况下，每个对象都有唯一的哈希码，找到它是一个 O(1) 操作。在大多数情况下，它接近 O(1)。

回复收藏 0 原文

柳絮泡泡 2024-07-18 03:32:16

这并不一定重要；这取决于您的馆藏大小和性能要求，以及您的类是否将在您可能不知道性能要求的库中使用。我经常知道我的集合大小不是很大，而且我的时间比通过创建完美哈希码获得的几微秒的性能更有价值；所以（为了摆脱编译器发出的烦人的警告）我只需使用：（

   public override int GetHashCode()
   {
      return base.GetHashCode();
   }

当然我也可以使用#pragma来关闭警告，但我更喜欢这种方式。）

当你处于你的位置时确实需要比这里其他人提到的所有问题都适用的性能。 最重要 - 否则，当从哈希集或字典中检索项目时，您会得到错误的结果：哈希代码不得随对象的生命周期而变化（更准确地说，每当需要哈希码时，例如作为字典中的键时）：例如，以下内容是错误的，因为 Value 是公共的，因此可以在实例的生命周期内从外部更改到类，因此您不得将其用作哈希码的基础：


   class A
   {
      public int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
      }
   }

另一方面，如果值无法更改，则可以使用：


   class A
   {
      public readonly int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //OK  Value is read-only and can't be changed during the instance's life time
      }
   }

It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:

   public override int GetHashCode()
   {
      return base.GetHashCode();
   }

(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)

When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:


   class A
   {
      public int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
      }
   }

On the other hand, if Value can't be changed it's ok to use:


   class A
   {
      public readonly int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //OK  Value is read-only and can't be changed during the instance's life time
      }
   }

回复收藏 0 原文

一身软味 2024-07-18 03:32:16

自 C# 9(.net 5或 .net core 3.1），您可能需要使用记录，因为它基于值默认情况下平等。

回复收藏 0 原文

神爱温柔 2024-07-18 03:32:16

您应该始终保证，如果两个对象相等（如 Equals() 所定义的那样），它们应该返回相同的哈希码。正如其他一些评论所述，理论上，如果对象永远不会在基于哈希的容器（如 HashSet 或 Dictionary）中使用，则这不是强制性的。不过，我建议您始终遵守这条规则。原因很简单，因为对于某些人来说，出于实际提高性能或只是以更好的方式传达代码语义的良好意图，将集合从一种类型更改为另一种类型太容易了。

例如，假设我们在列表中保存一些对象。一段时间后，有人实际上意识到 HashSet 是一个更好的选择，因为它具有更好的搜索特性。这就是我们可能遇到麻烦的时候。 List 将在内部使用该类型的默认相等比较器，这意味着在您的情况下等于，而 HashSet 使用 GetHashCode()。如果两者的行为不同，你的程序也会不同。请记住，此类问题并不是最容易解决的。

我在博客文章中总结了这种行为以及其他一些 GetHashCode() 陷阱，您可以在其中找到更多信息示例和解释。

回复收藏 0 原文

旧梦荧光笔 2024-07-18 03:32:16

据我了解，原始的 GetHashCode() 返回对象的内存地址，因此如果您想比较两个不同的对象，则必须重写它。

编辑：
这是不正确的，原来的 GetHashCode() 方法不能保证 2 个值相等。尽管相等的对象返回相同的哈希码。

回复收藏 0 原文

热风软妹 2024-07-18 03:32:16

在 .NET 中，当您重写 Equals() 方法时，建议也重写 GetHashCode()。原因与.NET 在其内置数据结构中使用GetHashCode() 的方式有关。

当您将对象存储在基于哈希的集合（例如 Dictionary 或 HashSet）中时，.NET 使用 GetHashCode() 返回的值来组织其对象数据。被视为相等的对象应返回相同的哈希码，从而在从此类集合中检索对象时提供最佳性能。

如果您重写 Equals()，您将更改使两个对象相等的定义。因此，如果您不重写 GetHashCode()，您认为“相等”的对象可能会返回不同的哈希码。当对象在基于哈希的集合中使用时，这可能会导致不一致的行为。即使您知道它们在那里，也可能在集合中找不到它们，因为集合正在错误的哈希存储桶中查找。

让我们看一个例子。假设您有一个 Person 类，并且您已重写 Equals() 来表示两个 Person 对象相等（如果它们的 Name< /code> 属性匹配。但您忘记重写GetHashCode()。现在，如果您将带有 Name="John" 的 Person 对象添加到 HashSet，然后尝试检查 Person 是否如果 HashSet 中存在 Name="John" 的对象，它可能会返回 false，这是不正确的，因为 GetHashCode() 可能会返回对象引用的哈希码，而不是用于相等比较的 Name 字符串。

为了避免此问题，每当您重写 Equals() 时，您还应该重写 GetHashCode() 以确保它使用与 Equals() 相同的属性> 确实如此。这将有助于在使用基于哈希的集合时保持一致性。

重写 GetHashCode() 需要生成一个哈希码，该哈希码考虑 Equals() 中使用的相同属性，并且均匀分布以防止哈希冲突。

以下是如何实现此目的的一个示例：

public override int GetHashCode()
{
   int hash = 17;

   // Suitable nullity checks etc, of course :)
   hash = (hash * 23) + field1.GetHashCode();
   hash = (hash * 23) + field2.GetHashCode();
   return hash;
}

在此示例中，field1 和 field2 是 Equals() 方法检查的字段。常量 17 和 23 只是任意选择的“神奇”数字，通常会产生良好的结果。

您还可以在 C# 8.0 及更高版本中使用 HashCode.Combine()：

public override int GetHashCode()
{
    return HashCode.Combine(field1, field2);
}

请记住，GetHashCode() 的目标不是完全避免冲突，而是均匀分布冲突。冲突是不可避免的，因为例如，可能的哈希码（2^32 for int）的数量小于可能的字符串值的数量。但良好的哈希函数将有助于确保哈希码值的分布更加均匀并减少冲突的概率，从而在使用基于哈希的集合时获得更好的性能。

In .NET, when you override the Equals() method, it's recommended to also override GetHashCode(). The reason is related to how .NET uses GetHashCode() in its built-in data structures.

When you store an object in a hash-based collection like Dictionary or HashSet, .NET uses the value returned by GetHashCode() to organize its data. Objects that are considered equal should return the same hash code, providing optimal performance when retrieving objects from such a collection.

If you override Equals(), you're changing the definition of what makes two objects equal. So, if you don't also override GetHashCode(), objects that you consider "equal" may return different hash codes. This can lead to inconsistent behavior when objects are used in a hash-based collection. They might not be found in the collection, even though you know they're there, because the collection is looking in the wrong hash bucket.

Let's see an example. Suppose, you have a Person class and you have overridden Equals() to say that two Person objects are equal if their Name property matches. But you forgot to override GetHashCode(). Now, if you add a Person object with Name="John" to a HashSet, and later try to check if the Person object with Name="John" exists in the HashSet, it might return false, which is incorrect, because the GetHashCode() might be returning the hash code of the object reference, not the Name string which you're using for equality comparison.

To avoid this issue, anytime you override Equals(), you should also override GetHashCode() to ensure it uses the same properties that Equals() does. This will help maintain consistency when using hash-based collections.

Overriding GetHashCode() requires producing a hash code that considers the same properties used in Equals(), and is also evenly distributed to prevent hash collisions.

Here is one example of how you might achieve this:

public override int GetHashCode()
{
   int hash = 17;

   // Suitable nullity checks etc, of course :)
   hash = (hash * 23) + field1.GetHashCode();
   hash = (hash * 23) + field2.GetHashCode();
   return hash;
}

In this example, field1 and field2 are the fields that the Equals() method checks. The constants 17 and 23 are just arbitrarily chosen 'magic' numbers that often give good results.

You can also use HashCode.Combine() in C# 8.0 and later:

public override int GetHashCode()
{
    return HashCode.Combine(field1, field2);
}

Remember, the goal of GetHashCode() is not to avoid collisions entirely, but to distribute them evenly. Collisions are inevitable because the number of possible hash codes (2^32 for int) is smaller than the number of possible string values, for example. But a good hash function will help ensure a more even distribution of hash code values and reduce the probability of collision, resulting in better performance when using hash-based collections.

回复收藏 0 原文

我不在是我 2024-07-18 03:32:16

考虑到公共属性，下面使用反射在我看来是一个更好的选择，因为这样你就不必担心添加/删除属性（尽管不是很常见的情况）。我发现这也表现得更好。（使用 Diagonistics 秒表比较时间）。

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }

Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }

回复收藏 0 原文

空名 2024-07-18 03:32:15

是的，如果您的项目将用作字典或 HashSet 等中的键，这一点很重要 - 因为它被使用（在没有自定义 IEqualityComparer 的情况下） ;) 将项目分组到存储桶中。如果两个项目的哈希码不匹配，则它们可能永远被视为相等（Equals 永远不会被调用）。

GetHashCode() 方法应反映 Equals 逻辑；规则是：

如果两个事物相等 (Equals(...) == true)，那么它们必须为 GetHashCode() 返回相同的值code>
如果 GetHashCode() 相等，则它们不必相同；这是一个冲突，Equals 将被调用来看看它是否是真正的相等。

在本例中，“return FooId;”看起来是一个合适的 GetHashCode() 实现。如果您正在测试多个属性，通常使用如下代码将它们组合起来，以减少对角线冲突（即，以便 new Foo(3,5) 具有与 不同的哈希码new Foo(5,3))：

在现代框架中，HashCode 类型具有帮助您从多个值创建哈希码的方法；在较旧的框架上，您需要不需要，所以类似：

unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
    int hash = 13;
    hash = (hash * 7) + field1.GetHashCode();
    hash = (hash * 7) + field2.GetHashCode();
    ...
    return hash;
}

哦 - 为了方便起见，您还可以考虑在覆盖 == 和 != 运算符>等于和GetHashCode。

此处演示了当您犯此错误时会发生什么。

Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>, etc - since this is used (in the absence of a custom IEqualityComparer<T>) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).

The GetHashCode() method should reflect the Equals logic; the rules are:

if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.

In this case, it looks like "return FooId;" is a suitable GetHashCode() implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5) has a different hash-code to new Foo(5,3)):

In modern frameworks, the HashCode type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:

unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
    int hash = 13;
    hash = (hash * 7) + field1.GetHashCode();
    hash = (hash * 7) + field2.GetHashCode();
    ...
    return hash;
}

Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GetHashCode.

A demonstration of what happens when you get this wrong is here.

回复收藏 0 原文

以歌曲疗慰 2024-07-18 03:32:15

实际上，正确实现 GetHashCode() 非常困难，因为除了 Marc 已经提到的规则之外，哈希码在对象的生命周期内不应更改。因此，用于计算哈希码的字段必须是不可变的。

我在使用NHibernate的时候终于找到了这个问题的解决方案。
我的方法是根据对象的 ID 计算哈希码。 ID 只能通过构造函数设置，因此如果您想更改 ID（这种情况不太可能发生），您必须创建一个具有新 ID 和新哈希码的新对象。此方法最适合 GUID，因为您可以提供随机生成 ID 的无参数构造函数。

回复收藏 0 原文

蒗幽 2024-07-18 03:32:15

通过重写Equals，您基本上是在声明您更好地了解如何比较给定类型的两个实例。

下面您可以看到 ReSharper 如何为您编写 GetHashCode() 函数的示例。请注意，此代码片段旨在由程序员进行调整：

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

如您所见，它只是尝试根据类中的所有字段猜测一个好的哈希代码，但是如果您知道对象的域或值范围，您仍然可以提供一个更好的一个。

By overriding Equals you're basically stating that you know better how to compare two instances of a given type.

Below you can see an example of how ReSharper writes a GetHashCode() function for you. Note that this snippet is meant to be tweaked by the programmer:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.

回复收藏 0 原文

~没有更多了~