C# 如何计算出对象的哈希码?

发布于 2024-07-04 18:38:11 字数 1912 浏览 4 评论 0原文

这个问题来自 元组

我开始思考元组应该具有的哈希码。 如果我们接受 KeyValuePair 类作为元组怎么办? 它不会覆盖 GetHashCode() 方法,因此可能不会知道它的“子级”的哈希码...因此,运行时将调用 Object.GetHashCode(),而它不知道真实的对象结构。

然后我们可以创建某个引用类型的两个实例,由于重载了 GetHashCode() 和 Equals(),它们实际上是 Equal。 并将它们用作元组中的“子项”来“欺骗”字典。

但这不起作用! 运行时以某种方式找出元组的结构并调用我们类的重载 GetHashCode!

它是如何工作的? Object.GetHashCode() 做了什么分析?

当我们使用一些复杂的按键时,在某些糟糕的情况下会影响性能吗? (可能是不可能的情况......但仍然)

将此代码视为示例:

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        static void Main(string[] args)
        {
            Dictionary<object, object> dict = new Dictionary<object, object>();

            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 1), 1), 1);

            //here we get the exception -- an item with the same key was already added
            //but how did it figure out the hash code?
            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 2), 1), 1); 

            return;
        }
    }
}

更新 我想我已经找到了对此的解释,如我的答案中所述。 其主要结果是:

  • 小心您的键及其哈希码:-)
  • 对于复杂的字典键,您必须正确重写 Equals() 和 GetHashCode()。

This question comes out of the discussion on tuples.

I started thinking about the hash code that a tuple should have.
What if we will accept KeyValuePair class as a tuple? It doesn't override the GetHashCode() method, so probably it won't be aware of the hash codes of it's "children"... So, run-time will call Object.GetHashCode(), which is not aware of the real object structure.

Then we can make two instances of some reference type, which are actually Equal, because of the overloaded GetHashCode() and Equals(). And use them as "children" in tuples to "cheat" the dictionary.

But it doesn't work! Run-time somehow figures out the structure of our tuple and calls the overloaded GetHashCode of our class!

How does it work? What's the analysis made by Object.GetHashCode()?

Can it affect the performance in some bad scenario, when we use some complicated keys? (probably, impossible scenario... but still)

Consider this code as an example:

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        static void Main(string[] args)
        {
            Dictionary<object, object> dict = new Dictionary<object, object>();

            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 1), 1), 1);

            //here we get the exception -- an item with the same key was already added
            //but how did it figure out the hash code?
            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 2), 1), 1); 

            return;
        }
    }
}

Update I think I've found an explanation for this as stated below in my answer. The main outcomes of it are:

  • Be careful with your keys and their hash codes :-)
  • For complicated dictionary keys you must override Equals() and GetHashCode() correctly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

留蓝 2024-07-11 18:38:11

看来我现在已经有了线索了。

我以为KeyValuePair是一个引用类型,但其实不是,它是一个结构体。 因此它使用 ValueType.GetHashCode() 方法。 MSDN 上说:“派生类型的一个或多个字段用于计算返回值”。

如果您将真正的引用类型作为“元组提供者”,您将欺骗字典(或您自己......)。

using System.Collections.Generic;

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        class Pair<T, R>
        {
            public T First { get; set; }
            public R Second { get; set; }
        }

        static void Main(string[] args)
        {
            var dict = new Dictionary<Pair<int, MyClass>, object>();

            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 2) }, 1);

            //this is a pair of the same values as previous! but... no exception this time...
            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 3) }, 1);

            return;
        }
    }
}

It seems that I have a clue now.

I thought KeyValuePair is a reference type, but it is not, it is a struct. And so it uses ValueType.GetHashCode() method. MSDN for it says: "One or more fields of the derived type is used to calculate the return value".

If you will take a real reference type as a "tuple-provider" you'll cheat the dictionary (or yourself...).

using System.Collections.Generic;

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        class Pair<T, R>
        {
            public T First { get; set; }
            public R Second { get; set; }
        }

        static void Main(string[] args)
        {
            var dict = new Dictionary<Pair<int, MyClass>, object>();

            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 2) }, 1);

            //this is a pair of the same values as previous! but... no exception this time...
            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 3) }, 1);

            return;
        }
    }
}
泪痕残 2024-07-11 18:38:11

查看 Brad Abrams 的这篇 帖子 以及Brian Grunmeeyer 的评论,了解有关 object.GetHashCode 如何工作的更多信息。 另外,请查看 Ayande 博客上的第一条评论 帖子。 我不知道框架的当前版本是否仍然遵循这些规则,或者他们是否真的像布拉德暗示的那样改变了它。

Check out this post by Brad Abrams and also the comment by Brian Grunkemeyer for some more information on how object.GetHashCode works. Also, take a look at the first comment on Ayande's blog post. I don't know if the current releases of the Framework still follow these rules or if they have actually changed it like Brad implied.

宣告ˉ结束 2024-07-11 18:38:11

我不再有这本书的参考资料,我必须找到它来确认,但我认为默认的基本哈希只是将对象的所有成员哈希在一起。 由于 CLR 的工作方式,它可以访问它们,因此您无法像它们那样编写好东西。

这完全是我简单读过的一些东西的记忆,所以你可以随意理解。

编辑:这本书是 MS Press 的Inside C#。 封面上有锯片的那一个。 作者花了很多时间解释如何在 CLR 中实现事物,如何将语言翻译成 MSIL 等。 等等。 如果你能找到这本书,读起来也不错。

编辑:形成链接,只要它看起来像

Object.GetHashCode() 使用
System.Object 类中用于生成哈希值的内部字段。 每个
创建的对象被分配一个唯一的对象键,存储为整数,当它
被建造。 这些键从 1 开始,每次出现新对象时都会递增
创建任何类型。

嗯,我想如果我希望使用对象作为哈希键,我需要编写一些自己的哈希代码。

I don't have the book reference anymore, and I'll have to find it just to confirm, but I thought the default base hash just hashed together all of the members of your object. It got access to them because of the way the CLR worked, so it wasn't something that you could write as well as they had.

That is completely from memory of something I briefly read so take it for what you will.

Edit: The book was Inside C# from MS Press. The one with the Saw blade on the cover. The author spent a good deal of time explaining how things were implemented in the CLR, how the language translated down to MSIL, ect. ect. If you can find the book it's not a bad read.

Edit: Form the link provided it looks like

Object.GetHashCode() uses an
internal field in the System.Object class to generate the hash value. Each
object created is assigned a unique object key, stored as an integer,when it
is created. These keys start at 1 and increment every time a new object of
any type gets created.

Hmm I guess I need to write a few of my own hash codes, if I expect to use objects as hash keys.

绝影如岚 2024-07-11 18:38:11

所以它可能不会知道它的“孩子”的哈希码。

您的示例似乎证明不然:-) 对于 KeyValuePair 来说,键 MyClass 和值 1 的哈希码是相同的。 KeyValuePair 实现必须同时使用其 KeyValue 作为自己的哈希码。

向上,字典类需要唯一的键。 它使用每个键提供的哈希码来解决问题。 请记住,运行时不会调用 Object.GetHashCode(),而是调用您为其提供的实例提供的 GetHashCode() 实现。

考虑一个更复杂的情况:

public class HappyClass
{

    enum TheUnit
    {
        Points,
        Picas,
        Inches
    }

    class MyDistanceClass
    {
        int distance;
        TheUnit units;

        public MyDistanceClass(int theDistance, TheUnit unit)
        {
            distance = theDistance;

            units = unit;
        }
        public static int ConvertDistance(int oldDistance, TheUnit oldUnit, TheUnit newUnit)
        {
            // insert real unit conversion code here :-)
            return oldDistance * 100;
        }

        /// <summary>
        /// Figure out if we are equal distance, converting into the same units of measurement if we have to
        /// </summary>
        /// <param name="obj">the other guy</param>
        /// <returns>true if we are the same distance</returns>
        public override bool Equals(object obj)
        {
            MyDistanceClass other = obj as MyDistanceClass;
            if (other == null) return false;

            if (other.units != this.units)
            {
                int newDistance = MyDistanceClass.ConvertDistance(other.distance, other.units, this.units);
                return distance.Equals(newDistance);
            }
            else
            {
                return distance.Equals(other.distance);
            }


        }

        public override int GetHashCode()
        {
            // even if the distance is equal in spite of the different units, the objects are not
            return distance.GetHashCode() * units.GetHashCode();
        }
    }
    static void Main(string[] args)
    {

        // these are the same distance... 72 points = 1 inch
        MyDistanceClass distPoint = new MyDistanceClass(72, TheUnit.Points);
        MyDistanceClass distInch = new MyDistanceClass(1, TheUnit.Inch);

        Debug.Assert(distPoint.Equals(distInch), "these should be true!");
        Debug.Assert(distPoint.GetHashCode() != distInch.GetHashCode(), "But yet they are fundimentally different values");

        Dictionary<object, object> dict = new Dictionary<object, object>();

        dict.Add(new KeyValuePair<MyDistanceClass, object>(distPoint, 1), 1);

        //this should not barf
        dict.Add(new KeyValuePair<MyDistanceClass, object>(distInch, 1), 1);

        return;
    }

}

基本上......在我的示例中,您希望两个距离相同的对象为 Equals 返回“true”,但返回不同的哈希码。

so probably it won't be aware of the hash codes of it's "children".

Your example seems to prove otherwise :-) The hash code for the key MyClass and the value 1 is the same for both KeyValuePair's . The KeyValuePair implementation must be using both its Key and Value for its own hash code

Moving up, the dictionary class wants unique keys. It is using the hashcode provided by each key to figure things out. Remember that the runtime isn't calling Object.GetHashCode(), but it is calling the GetHashCode() implementation provided by the instance you give it.

Consider a more complex case:

public class HappyClass
{

    enum TheUnit
    {
        Points,
        Picas,
        Inches
    }

    class MyDistanceClass
    {
        int distance;
        TheUnit units;

        public MyDistanceClass(int theDistance, TheUnit unit)
        {
            distance = theDistance;

            units = unit;
        }
        public static int ConvertDistance(int oldDistance, TheUnit oldUnit, TheUnit newUnit)
        {
            // insert real unit conversion code here :-)
            return oldDistance * 100;
        }

        /// <summary>
        /// Figure out if we are equal distance, converting into the same units of measurement if we have to
        /// </summary>
        /// <param name="obj">the other guy</param>
        /// <returns>true if we are the same distance</returns>
        public override bool Equals(object obj)
        {
            MyDistanceClass other = obj as MyDistanceClass;
            if (other == null) return false;

            if (other.units != this.units)
            {
                int newDistance = MyDistanceClass.ConvertDistance(other.distance, other.units, this.units);
                return distance.Equals(newDistance);
            }
            else
            {
                return distance.Equals(other.distance);
            }


        }

        public override int GetHashCode()
        {
            // even if the distance is equal in spite of the different units, the objects are not
            return distance.GetHashCode() * units.GetHashCode();
        }
    }
    static void Main(string[] args)
    {

        // these are the same distance... 72 points = 1 inch
        MyDistanceClass distPoint = new MyDistanceClass(72, TheUnit.Points);
        MyDistanceClass distInch = new MyDistanceClass(1, TheUnit.Inch);

        Debug.Assert(distPoint.Equals(distInch), "these should be true!");
        Debug.Assert(distPoint.GetHashCode() != distInch.GetHashCode(), "But yet they are fundimentally different values");

        Dictionary<object, object> dict = new Dictionary<object, object>();

        dict.Add(new KeyValuePair<MyDistanceClass, object>(distPoint, 1), 1);

        //this should not barf
        dict.Add(new KeyValuePair<MyDistanceClass, object>(distInch, 1), 1);

        return;
    }

}

Basically... in the case of my example, you'd want two objects that are the same distance to return "true" for Equals, but yet return different hash codes.

岁吢 2024-07-11 18:38:11

以下是四元组(内部包含 4 个元组组件)的正确哈希和相等实现。 此代码确保在 HashSet 和字典中正确使用此特定元组。

有关该主题的更多信息(包括源代码)此处

注意使用unchecked关键字(以避免溢出)并在 obj 为 null 时抛出 NullReferenceException(根据基本方法的要求)

public override bool Equals(object obj)
{
    if (ReferenceEquals(null, obj))
        throw new NullReferenceException("obj is null");
    if (ReferenceEquals(this, obj)) return true;
    if (obj.GetType() != typeof (Quad<T1, T2, T3, T4>)) return false;
    return Equals((Quad<T1, T2, T3, T4>) obj);
}

public bool Equals(Quad<T1, T2, T3, T4> obj)
{
    if (ReferenceEquals(null, obj)) return false;
    if (ReferenceEquals(this, obj)) return true;
    return Equals(obj.Item1, Item1)
        && Equals(obj.Item2, Item2)
            && Equals(obj.Item3, Item3)
                && Equals(obj.Item4, Item4);
}

public override int GetHashCode()
{
    unchecked
    {
        int result = Item1.GetHashCode();
        result = (result*397) ^ Item2.GetHashCode();
        result = (result*397) ^ Item3.GetHashCode();
        result = (result*397) ^ Item4.GetHashCode();
        return result;
    }
}
public static bool operator ==(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return Equals(left, right);
}


public static bool operator !=(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return !Equals(left, right);
}

Here are the proper Hash and equality implementations for the Quad tuple (contains 4 tuple components inside). This code ensures proper usage of this specific tuple in HashSets and the dictionaries.

More on the subject (including the source code) here.

Note usage of the unchecked keyword (to avoid overflows) and throwing NullReferenceException if obj is null (as required by the base method)

public override bool Equals(object obj)
{
    if (ReferenceEquals(null, obj))
        throw new NullReferenceException("obj is null");
    if (ReferenceEquals(this, obj)) return true;
    if (obj.GetType() != typeof (Quad<T1, T2, T3, T4>)) return false;
    return Equals((Quad<T1, T2, T3, T4>) obj);
}

public bool Equals(Quad<T1, T2, T3, T4> obj)
{
    if (ReferenceEquals(null, obj)) return false;
    if (ReferenceEquals(this, obj)) return true;
    return Equals(obj.Item1, Item1)
        && Equals(obj.Item2, Item2)
            && Equals(obj.Item3, Item3)
                && Equals(obj.Item4, Item4);
}

public override int GetHashCode()
{
    unchecked
    {
        int result = Item1.GetHashCode();
        result = (result*397) ^ Item2.GetHashCode();
        result = (result*397) ^ Item3.GetHashCode();
        result = (result*397) ^ Item4.GetHashCode();
        return result;
    }
}
public static bool operator ==(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return Equals(left, right);
}


public static bool operator !=(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return !Equals(left, right);
}
垂暮老矣 2024-07-11 18:38:11

不要在可变类上覆盖 GetHashcode() 和 Equals(),仅在不可变类或结构上覆盖它,否则如果您修改用作键的对象,哈希表将无法正常工作(您将无法修改键对象后检索与键关联的值)

此外,哈希表不使用哈希码来标识对象,它们使用键对象本身作为标识符,不需要用于在哈希表中添加条目的所有键返回不同的哈希码,但建议这样做,否则性能会受到很大影响。

Don't override GetHashcode() and Equals() on mutable classes, only override it on immutable classes or structures, else if you modify a object used as key the hash table won't function properly anymore (you won't be able to retrieve the value associated to the key after the key object was modified)

Also hash tables don't use hashcodes to identify objects they use the key objects themselfes as identifiers, it's not required that all keys that are used to add entries in a hash table return different hashcodes, but it is recommended that they do, else performance suffers greatly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文