如何在 ToLookUp() 扩展中使用 IEqualityComparer.Equals()

发布于 2024-11-28 05:01:45 字数 4695 浏览 1 评论 0 原文

我偶然发现一篇关于生日悖论的文章及其覆盖时的含义使用 GetHashCode 方法时,我发现自己处于绑定状态。

在测试中,我们发现在调用 ToLookup() 扩展,尽管提供了 Equals 的实现,但仅使用了 GetHashcode

我想我明白为什么会发生这种情况,ToLookupHashSetDictionary等的内部工作,使用用于存储和/或索引其元素的哈希码?

有没有办法以某种方式提供功能,以便使用 equals 方法实际执行相等比较?或者我不应该担心碰撞?我自己没有做过数学计算,但根据我链接的第一篇文章,在达到 50% 的碰撞几率之前,列表中只需要 77,163 个元素。

如果我理解正确的话,一个逐个属性比较的 Equals() 重写

Return (a.Property1 == b.Property1 && a.Property2 == b.Property2 && ...)

应该有零碰撞机会?那么我怎样才能让我的 ToLookup() 以这种方式进行相等比较呢?


如果您需要一个示例来说明我的意思:

C#

class Program
{

    static void Main(string[] args)
    {
        DoStuff();
        Console.ReadKey();
    }

    public class AnEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
        public string OtherProperty1 { get; set; }
        public List<string> OtherProperty2 { get; set; }
    }

    public class KeyEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
    }

    public static void DoStuff()
    {
        var a = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "foo"};
        var b = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "bar"};
        var c = new AnEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999, OtherProperty1 = "yada"};

        var entityList = new List<AnEntity> { a, b, c };

        var lookup = entityList.ToLookup(n => new KeyEntity {KeyProperty1 = n.KeyProperty1, KeyProperty2 = n.KeyProperty2, KeyProperty3 = n.KeyProperty3});

        // I want these to all return true
        Debug.Assert(lookup.Count == 2);
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].First().OtherProperty1 == "foo");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].Last().OtherProperty1 == "bar");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999}].Single().OtherProperty1 == "yada");
    }

}

VB

Module Program

    Public Sub Main(args As String())
        DoStuff()
        Console.ReadKey()
    End Sub

    Public Class AnEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
        Public Property OtherProperty1 As String
        Public Property OtherProperty2 As List(Of String) 
    End Class

    Public Class KeyEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
    End Class

    Public Sub DoStuff()
        Dim a = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "foo"}
        Dim b = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "bar"}
        Dim c = New AnEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999, .OtherProperty1 = "yada"}

        Dim entityList = New List(Of AnEntity) From {a, b, c}

        Dim lookup = entityList.ToLookup(Function(n) New KeyEntity With {.KeyProperty1 = n.KeyProperty1, .KeyProperty2 = n.KeyProperty2, .KeyProperty3 = n.KeyProperty3})

        ' I want these to all return true
        Debug.Assert(lookup.Count = 2)
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).First().OtherProperty1 = "foo")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).Last().OtherProperty1 = "bar")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999}).Single().OtherProperty1 = "yada")
    End Sub

End Module

我可以让它与 GetHashcode() 的重写一起使用,没有问题。但我不想使用 GetHashcode,因为如果我的列表中有 109,125 个元素,显然我已经有 75% 的机会发生冲突?如果它使用前面提到的 Equals() 覆盖,我想我会是 0%?

I stumbled upon an article regarding the Birthday Paradox and it's implications when overriding the GetHashCode method, I find myself in a bind.

In tests, we found that in calls to the ToLookup() Extension, only GetHashcode is used, despite providing the implementation for Equals.

I think I understand why this happens, the internal working of ToLookup, HashSet, Dictionary, etc, use the HashCodes to store and/or index their elements?

Is there a way to somehow provide the functionality so that the equality comparison is actual performed using the equals method? Or should I not be concerned with the collisions? I haven't done the maths myself, but according to the first article I linked, you would only need 77,163 elements in a list before reaching a 50% chance of collision.

If I understand this correctly, an Equals() override that compares property by property such as

Return (a.Property1 == b.Property1 && a.Property2 == b.Property2 && ...)

should have a zero chance of collision? So how can I get my ToLookup() to equality compare this way?


In case you need an example of what I mean:

C#

class Program
{

    static void Main(string[] args)
    {
        DoStuff();
        Console.ReadKey();
    }

    public class AnEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
        public string OtherProperty1 { get; set; }
        public List<string> OtherProperty2 { get; set; }
    }

    public class KeyEntity
    {
        public int KeyProperty1 { get; set; }
        public int KeyProperty2 { get; set; }
        public int KeyProperty3 { get; set; }
    }

    public static void DoStuff()
    {
        var a = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "foo"};
        var b = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "bar"};
        var c = new AnEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999, OtherProperty1 = "yada"};

        var entityList = new List<AnEntity> { a, b, c };

        var lookup = entityList.ToLookup(n => new KeyEntity {KeyProperty1 = n.KeyProperty1, KeyProperty2 = n.KeyProperty2, KeyProperty3 = n.KeyProperty3});

        // I want these to all return true
        Debug.Assert(lookup.Count == 2);
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].First().OtherProperty1 == "foo");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].Last().OtherProperty1 == "bar");
        Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999}].Single().OtherProperty1 == "yada");
    }

}

VB

Module Program

    Public Sub Main(args As String())
        DoStuff()
        Console.ReadKey()
    End Sub

    Public Class AnEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
        Public Property OtherProperty1 As String
        Public Property OtherProperty2 As List(Of String) 
    End Class

    Public Class KeyEntity
        Public Property KeyProperty1 As Integer
        Public Property KeyProperty2 As Integer
        Public Property KeyProperty3 As Integer
    End Class

    Public Sub DoStuff()
        Dim a = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "foo"}
        Dim b = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "bar"}
        Dim c = New AnEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999, .OtherProperty1 = "yada"}

        Dim entityList = New List(Of AnEntity) From {a, b, c}

        Dim lookup = entityList.ToLookup(Function(n) New KeyEntity With {.KeyProperty1 = n.KeyProperty1, .KeyProperty2 = n.KeyProperty2, .KeyProperty3 = n.KeyProperty3})

        ' I want these to all return true
        Debug.Assert(lookup.Count = 2)
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).First().OtherProperty1 = "foo")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).Last().OtherProperty1 = "bar")
        Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999}).Single().OtherProperty1 = "yada")
    End Sub

End Module

I can get that to work with an override of GetHashcode(), no problems. But I don't want to use GetHashcode because if I have, for example, 109,125 elements in my list, apparently I'm already at 75% chance of collision? If it used aforementioned Equals() override, I think I'd be at 0%?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

慕巷 2024-12-05 05:01:45

您链接到的文章完全具有误导性(它的许多评论都强调了这一点)。

尽可能使用 GetHashCode,因为它速度很快;如果存在哈希冲突,则使用 Equals 来消除冲突项之间的歧义。只要您实施正确地EqualsGetHashCode——无论是类型本身还是自定义IEqualityComparer 实现——那么就不会有任何问题。

您的示例代码的问题在于您根本没有覆盖 EqualsGetHashCode 。这意味着使用默认实现,并且默认实现对引用类型使用引用比较,而不是值比较。

这意味着您不会遇到哈希冲突,因为您要比较的对象与原始对象不同,即使它们具有相同的值。反过来,这意味着您的示例代码不需要 Equals 。正确覆盖 EqualsGetHashCode,或者设置 IEqualityComparer 来执行此操作,一切都会按您的预期开始工作。

The article that you've linked to is completely misleading (and many of its comments highlight this).

GetHashCode is used where possible because it's fast; if there are hash collisions then Equals is used to disambiguate between the colliding items. So long as you implement Equals and GetHashCode correctly -- whether in the types themselves or a custom IEqualityComparer<T> implementation -- then there won't be any problems.

The problem with your example code is that you're not overriding Equals and GetHashCode at all. This means that the the default implementations are used, and the default implementations use reference comparisons for reference types, not value comparisons.

This means that you're not getting hash collisions because the objects you're comparing against are different to the original objects, even though they have the same values. This, in turn, means that Equals just isn't required by your example code. Override Equals and GetHashCode correctly, or set up an IEqualityComparer<T> to do so, and everything will start working as you expect.

臻嫒无言 2024-12-05 05:01:45

生日悖论不适用于这种情况。生日悖论与非确定性随机集有关,而哈希码计算是确定性的。具有不同状态的 2 个对象共享相同哈希码的可能性更接近十亿分之一左右,当然不会低至 77000 - 因此我认为您没有什么可担心的。

The birthday paradox does not apply in this situation. The birthday paradox relates to non-deterministic random sets, whereas hashcode computation is determinitic. the chances of 2 objects with different state sharing the same hashcode is much closer to 1 in a billion or so, certainly not as low as 77 thousand - therefore I dont think you have anything to worry about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文