如何在 ToLookUp() 扩展中使用 IEqualityComparer.Equals()
我偶然发现一篇关于生日悖论的文章及其覆盖时的含义使用 GetHashCode
方法时,我发现自己处于绑定状态。
在测试中,我们发现在调用 ToLookup()
扩展,尽管提供了 Equals 的实现,但仅使用了 GetHashcode
。
我想我明白为什么会发生这种情况,ToLookup
、HashSet
、Dictionary
等的内部工作,使用用于存储和/或索引其元素的哈希码?
有没有办法以某种方式提供功能,以便使用 equals 方法实际执行相等比较?或者我不应该担心碰撞?我自己没有做过数学计算,但根据我链接的第一篇文章,在达到 50% 的碰撞几率之前,列表中只需要 77,163 个元素。
如果我理解正确的话,一个逐个属性比较的 Equals()
重写
Return (a.Property1 == b.Property1 && a.Property2 == b.Property2 && ...)
应该有零碰撞机会?那么我怎样才能让我的 ToLookup()
以这种方式进行相等比较呢?
如果您需要一个示例来说明我的意思:
C#
class Program
{
static void Main(string[] args)
{
DoStuff();
Console.ReadKey();
}
public class AnEntity
{
public int KeyProperty1 { get; set; }
public int KeyProperty2 { get; set; }
public int KeyProperty3 { get; set; }
public string OtherProperty1 { get; set; }
public List<string> OtherProperty2 { get; set; }
}
public class KeyEntity
{
public int KeyProperty1 { get; set; }
public int KeyProperty2 { get; set; }
public int KeyProperty3 { get; set; }
}
public static void DoStuff()
{
var a = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "foo"};
var b = new AnEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3, OtherProperty1 = "bar"};
var c = new AnEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999, OtherProperty1 = "yada"};
var entityList = new List<AnEntity> { a, b, c };
var lookup = entityList.ToLookup(n => new KeyEntity {KeyProperty1 = n.KeyProperty1, KeyProperty2 = n.KeyProperty2, KeyProperty3 = n.KeyProperty3});
// I want these to all return true
Debug.Assert(lookup.Count == 2);
Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].First().OtherProperty1 == "foo");
Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 1, KeyProperty2 = 2, KeyProperty3 = 3}].Last().OtherProperty1 == "bar");
Debug.Assert(lookup[new KeyEntity {KeyProperty1 = 999, KeyProperty2 = 999, KeyProperty3 = 999}].Single().OtherProperty1 == "yada");
}
}
VB
Module Program
Public Sub Main(args As String())
DoStuff()
Console.ReadKey()
End Sub
Public Class AnEntity
Public Property KeyProperty1 As Integer
Public Property KeyProperty2 As Integer
Public Property KeyProperty3 As Integer
Public Property OtherProperty1 As String
Public Property OtherProperty2 As List(Of String)
End Class
Public Class KeyEntity
Public Property KeyProperty1 As Integer
Public Property KeyProperty2 As Integer
Public Property KeyProperty3 As Integer
End Class
Public Sub DoStuff()
Dim a = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "foo"}
Dim b = New AnEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3, .OtherProperty1 = "bar"}
Dim c = New AnEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999, .OtherProperty1 = "yada"}
Dim entityList = New List(Of AnEntity) From {a, b, c}
Dim lookup = entityList.ToLookup(Function(n) New KeyEntity With {.KeyProperty1 = n.KeyProperty1, .KeyProperty2 = n.KeyProperty2, .KeyProperty3 = n.KeyProperty3})
' I want these to all return true
Debug.Assert(lookup.Count = 2)
Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).First().OtherProperty1 = "foo")
Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 1, .KeyProperty2 = 2, .KeyProperty3 = 3}).Last().OtherProperty1 = "bar")
Debug.Assert(lookup(New KeyEntity With {.KeyProperty1 = 999, .KeyProperty2 = 999, .KeyProperty3 = 999}).Single().OtherProperty1 = "yada")
End Sub
End Module
我可以让它与 GetHashcode()
的重写一起使用,没有问题。但我不想使用 GetHashcode,因为如果我的列表中有 109,125 个元素,显然我已经有 75% 的机会发生冲突?如果它使用前面提到的 Equals()
覆盖,我想我会是 0%?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您链接到的文章完全具有误导性(它的许多评论都强调了这一点)。
尽可能使用 GetHashCode,因为它速度很快;如果存在哈希冲突,则使用
Equals
来消除冲突项之间的歧义。只要您实施正确地Equals
和GetHashCode
——无论是类型本身还是自定义IEqualityComparer
实现——那么就不会有任何问题。您的示例代码的问题在于您根本没有覆盖
Equals
和GetHashCode
。这意味着使用默认实现,并且默认实现对引用类型使用引用比较,而不是值比较。这意味着您不会遇到哈希冲突,因为您要比较的对象与原始对象不同,即使它们具有相同的值。反过来,这意味着您的示例代码不需要
Equals
。正确覆盖Equals
和GetHashCode
,或者设置IEqualityComparer
来执行此操作,一切都会按您的预期开始工作。The article that you've linked to is completely misleading (and many of its comments highlight this).
GetHashCode
is used where possible because it's fast; if there are hash collisions thenEquals
is used to disambiguate between the colliding items. So long as you implementEquals
andGetHashCode
correctly -- whether in the types themselves or a customIEqualityComparer<T>
implementation -- then there won't be any problems.The problem with your example code is that you're not overriding
Equals
andGetHashCode
at all. This means that the the default implementations are used, and the default implementations use reference comparisons for reference types, not value comparisons.This means that you're not getting hash collisions because the objects you're comparing against are different to the original objects, even though they have the same values. This, in turn, means that
Equals
just isn't required by your example code. OverrideEquals
andGetHashCode
correctly, or set up anIEqualityComparer<T>
to do so, and everything will start working as you expect.生日悖论不适用于这种情况。生日悖论与非确定性随机集有关,而哈希码计算是确定性的。具有不同状态的 2 个对象共享相同哈希码的可能性更接近十亿分之一左右,当然不会低至 77000 - 因此我认为您没有什么可担心的。
The birthday paradox does not apply in this situation. The birthday paradox relates to non-deterministic random sets, whereas hashcode computation is determinitic. the chances of 2 objects with different state sharing the same hashcode is much closer to 1 in a billion or so, certainly not as low as 77 thousand - therefore I dont think you have anything to worry about.