IEqualityComparer当项目不相等时 GetHashCode 返回相同的值?

发布于 2024-12-08 14:48:23 字数 1221 浏览 0 评论 0原文

GetHashCode 方法rel="nofollow">IEqualityComparer 接口,它指出:

需要实现以确保 Equals 方法 对于两个对象 x 和 y 返回 true,则返回的值 x 的 GetHashCode 方法必须等于 y 返回的值。

众所周知,为什么您希望两个 T 实例在两项相等时返回相同的哈希码;它们不同意味着它们不相等,而它们相同则意味着它们潜在相等。

当两个实例相等时(即使它们的值可能表明如此),我将返回值的引用解释为未定义。

以下面为例。我有一个 int? 序列,我想将其用于统计分类 其中每个非空 int? 表示类的一个属性(想想枚举值)。在这些值为空的情况下,您不希望将这些值视为相等,因为它们会使训练集偏向于缺失值。如果有的话,在这种情况下,与其他空值相比,您会希望空值返回 false。

问题是,在 GetHashCode 方法中,当给定 null 时,我可能想返回 0 (或其他一些数字,也许 Int32.MinValue)。现在,我知道,当使用此 IEqualityComparer 实现对任何内容进行键控时,检查字典中键是否存在的性能对于这些情况来说并不是最佳的。

也就是说,当调用 GetHashCode 时,当调用 Equals 返回 false 时,返回已知与其他值冲突的值是否有效?我倾向于是,因为上面的引用在这件事上没有定义。

In the "Notes to Implementers" section in the documentation for the GetHashCode method of the IEqualityComparer<T> interface, it states:

Implementations are required to ensure that if the Equals method
returns true for two objects x and y, then the value returned by the
GetHashCode method for x must equal the value returned for y.

It's well known why you want two instances of T to return the same hash code in the event that two items are equal; them being different would mean they are not equal, while them being the same means that they are potentially equal.

I interpret the quote to be undefined for the return value when two instances are not equal (even though their values might suggest so).

Take the following for example. I have a sequence of int? which I want to use for statistical classification where each non-null int? represents an attribute on a class (think enum values). In the case where these values are null, you don't want the values to be considered equal, as they would bias the training set towards missing values. If anything, you would want null values in this case when compared to other null values to return false.

The thing is, in the GetHashCode method, when given a null, I'd probably want to return 0 (or some other number, maybe Int32.MinValue). Now, I know that when anything is keyed using this IEqualityComparer<T> implementation, the performance for checking for existence of a key in the dictionary is not going to be optimal for those cases.

That said, is it valid to return a value known to conflict for other values when GetHashCode is called, when the call to Equals returns false? I'm leaning towards yes, as the quote above is undefined in this matter.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我不吻晚风 2024-12-15 14:48:23

对于几乎所有类型来说,绝对有必要有两个值v1v2,这样

v1.Equals(v2) == false
v1.GetHashCode() == v2.GetHashCode()

......或与 IEqualityComparer 等效的内容。唯一不是的情况是最多有 232 个不同(不相等)值。一旦有更多的值,鸽子洞原则就会发挥作用 要重用的哈希码 - 只是没有足够的哈希码来循环!

Eric Lippert 有一个 < em>很棒的关于哈希码的博客文章非常值得一读。基本上我认为你的想法是正确的,但值得强化它们。

顺便说一句,空值问题是一个有趣的问题。 IEqualityComparer 允许 GetHashCode 抛出异常,但我相信内置的 Comparer 实现永远不会这样做。听起来您确实遇到了一个问题 - Equals 应该是自反的 - 因此空值应该等于其自身。您可能需要仔细考虑这一点......您可以表示“不同的”空值吗?

It is absolutely necessary for almost all types for there to be two values v1 and v2 such that

v1.Equals(v2) == false
v1.GetHashCode() == v2.GetHashCode()

... or the equivalent with an IEqualityComparer<T>. The only cases where that isn't the case are where there are at most 232 distinct (non-equal) values. As soon as there are more values than that, the pigeon-hole principle forces hash codes to be reused - there just aren't enough hash codes to go round!

Eric Lippert had a great blog post on hash codes which is well worth a read. Basically I think you've got the right ideas, but it's worth reinforcing them.

The issue of nulls is an interesting one, by the way. IEqualityComparer<T> allows GetHashCode to throw an exception, but I believe the built-in Comparer<T> implementations never do. It sounds like you do have one problem though - that Equals should be reflexive - so a null value should be equal to itself. You may need to think about that one carefully... can you represent "different" null values?

此生挚爱伱 2024-12-15 14:48:23

IMO:由于 Equals 始终是对象相等性的最终仲裁者,因此 GetHashCode 只是不相等值的快捷方式。如果从 GetHashCode 返回相同的值(无论对象是否实际上相等),将始终调用 Equals 进行比较。预计 GetHashCode 可能会在不相等的值之间发生冲突。我认为这种行为没有任何含糊或未定义的地方。

IMO: Since Equals is always the final arbiter on the equality of objects, GetHashCode is only ever a shortcut for non-equal values. In the case of identical values returned from GetHashCode (regardless of whether the objects are actually equal), Equals will then always be called to compare. It's expected that GetHashCode is likely to conflict between non-equal values. I don't see anything ambiguous or undefined about this behavior.

离去的眼神 2024-12-15 14:48:23

只要满足您引用的条件,您就可以返回任何您想要的值。否则,依赖于此条件的类将无法正常工作。

例如,采用一个由不区分大小写的键索引的字典,并假设您的 GetHashCode 实现返回第一个字符的值。因此“A”和“a”相等,但具有不同的哈希值(65 和 97)。换句话说:你违反了规则。如果您执行以下操作:

dict["A"] = "something";
Console.WriteLine(dict["a"]);

那么即使键“A”和“a”相等,第二行也可能会失败并出现 KeyNotFoundException。

You can return any value you want, as long as the condition you quote is satisfied. Otherwise, classes that depend on this condition will not work correctly.

For example, take a dictionary that is indexed by a case-insensitive key, and say your implementation of GetHashCode returns the value of the first character. So "A" and "a" are equal but have different hash values (65 and 97). In other words: you violate the rule. If you then do something like:

dict["A"] = "something";
Console.WriteLine(dict["a"]);

then the second line will likely fail with a KeyNotFoundException even though the keys "A" and "a" are equal.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文