良好的 GetHashCode() 重写符合顺序的 Foo 对象列表
EnumerableObject : IEnumerable
包装 List
如果 EnumerableObject a.SequenceEquals( EnumerableObject b)
,则它们相等。
因此,必须实现GetHashCode
。问题是,对列表中的每个元素进行异或运算,对于任何包含且仅包含相同元素的列表,无论顺序如何,都将返回相同的哈希码。就其工作而言,这是好的,但会导致许多冲突,从而减慢检索速度等。
对于依赖于顺序的对象列表,什么是好的、快速的 GetHashCode
方法?
EnumerableObject : IEnumerable<Foo>
wraps a List<Foo>
If EnumerableObject a.SequenceEquals( EnumerableObject b)
, then they are equal.
Therefore, a GetHashCode
must be implemented. The problem is XORing each element in the list will return the same hash code for any list with all and only the same elements, regardless of order. This is Okay in terms of it working, but will result in many collisions, which will slow down retrieval, etc.
What is a good, fast GetHashCode
method for lists of objects that is order dependent?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会按照通常组合哈希码的方式进行操作 - 进行加法和乘法:(
请注意,在将其用于任何描述的哈希表中的键之后,您不应该向列表中添加任何内容,因为哈希值将会改变。这也假设没有空条目 - 如果可能存在,您需要考虑这一点。)
I'd do it the same way I normally combine hash codes - with an addition and a multiplication:
(Note that you shouldn't add anything to the list after this has been used for the key in a hash table of any description, as the hash will change. This also assumes that there are no null entries - if there could be, you need to take account of that.)
首先,仔细检查您是否需要哈希码。您是否要将这些列表放入哈希映射结构(例如字典、哈希集等)中?如果没有,就别想了。
现在,假设您的意思是 EnumerableObject 出于某种原因已经覆盖了 Equals(object)(并且希望因此也实现了 IEquatable),那么这确实是必要的。您想要平衡速度与位分布。
一个好的起点是 mult+add 或 shift+xor ,例如:(
这假设您正在使用 item.Equals() 进行序列相等性比较,如果您使用 IEqualityComparer 的 equals,则需要调用其哈希码)。
从那里我们可以进行优化。
如果不允许使用 null 项,请删除 null 检查(请小心,如果确实找到 null,这将使代码抛出异常)。
如果非常大的列表很常见,我们需要减少检查的数量,同时尽量不导致大量冲突。比较以下不同的实现:
每种实现都限制检查的项目总数,这会加快执行速度,但会带来哈希质量较差的风险。哪个(如果有)最好取决于具有相同开始或相同结束的集合是否更有可能。
改变上面的数字16可以调整平衡;越小速度越快,但越高哈希质量越好,哈希冲突的风险越低。
编辑:现在你可以使用我的 SpookyHash v. 2 的实现:
这将创建一个比mult+add 或 shift+xor,同时速度也特别快(特别是在 64 位进程中,因为算法为此进行了优化,尽管它在 32 位上也能很好地工作)。
Firstly, double-check that you need a hashcode at all. Are you going to be putting these lists into a hash-mapped structure (e.g. dictionary, hashset, etc)? If not, forget about it.
Now, assuming that you mean that EnumerableObject already overrides
Equals(object)
(and hopefully therefore also implementsIEquatable<EnumerableObject>
) for some reason, then this is indeed necessary. You want to balance speed versus bit distribution.A good starting point is a mult+add or a shift+xor like:
(This assumes that you are using item.Equals() for your sequence equality comparison, if you're using an IEqualityComparer's equals you'll need to call into its hashcode).
From there we can optimise.
If null items are disallowed, remove the null-check (be careful, this will make the code throw if it ever does find a null).
If very large lists are common we need to reduce the number examined, while trying not to result in lots of collisions. Compare the following different implementations:
Each of these restrict the total number of items examined, which speeds execution but risks poorer quality hashes. Which (if any) is best depends on whether collections with the same start or the same end are more likely.
Changing the number 16 above adjusts the balance; smaller is faster but higher is better hash quality with a lower risk of hash collisions.
Edit: And now you can use my implementation of SpookyHash v. 2:
This will create a much better distribution than mult+add or shift+xor, while also being particularly fast (especially in 64-bit processes as the algorithm is optimised for that, though it works well on 32-bit too).
.GetHashCode()
方法通常只返回基于对象引用(指针地址)的哈希值。这是因为计算可枚举列表中每个项目的哈希码可能非常耗时。我不喜欢覆盖现有的行为,而是更喜欢使用扩展方法,并且仅在需要确定性地确定哈希码的情况下使用它:The
.GetHashCode()
method usually just returns a hash based on the object reference (pointer address). This is because calculating the hash code of every item in an enumerable list can be very time intensive. Instead of overwriting the existing behaviour, I prefer to use an extension method and use it only where the hash code needs to be deterministically determined:我的扩展方法基于 Jon Skeet 答案:
My extension method with null handling based on Jon Skeet answer: