良好的 GetHashCode() 重写符合顺序的 Foo 对象列表

发布于 2024-12-15 12:59:17 字数 375 浏览 3 评论 0原文

EnumerableObject : IEnumerable

包装 List

如果 EnumerableObject a.SequenceEquals( EnumerableObject b),则它们相等。

因此,必须实现GetHashCode。问题是,对列表中的每个元素进行异或运算,对于任何包含且仅包含相同元素的列表,无论顺序如何,都将返回相同的哈希码。就其工作而言,这是好的,但会导致许多冲突,从而减慢检索速度等。

对于依赖于顺序的对象列表,什么是好的、快速的 GetHashCode 方法?

EnumerableObject : IEnumerable<Foo>

wraps a List<Foo>

If EnumerableObject a.SequenceEquals( EnumerableObject b), then they are equal.

Therefore, a GetHashCode must be implemented. The problem is XORing each element in the list will return the same hash code for any list with all and only the same elements, regardless of order. This is Okay in terms of it working, but will result in many collisions, which will slow down retrieval, etc.

What is a good, fast GetHashCode method for lists of objects that is order dependent?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

握住你手 2024-12-22 12:59:17

我会按照通常组合哈希码的方式进行操作 - 进行加法和乘法:(

public override int GetHashCode()
{
    unchecked
    {
        int hash = 19;
        foreach (var foo in foos)
        {
            // This code assumes the collection does not contain null values
            hash = hash * 31 + foo.GetHashCode();
            // If foo might be null:
            // hash = hash * 31 + (foo?.GetHashCode() ?? 0);
        }
        return hash;
    }
}

请注意,在将其用于任何描述的哈希表中的键之后,您不应该向列表中添加任何内容,因为哈希值将会改变。这也假设没有空条目 - 如果可能存在,您需要考虑这一点。)

I'd do it the same way I normally combine hash codes - with an addition and a multiplication:

public override int GetHashCode()
{
    unchecked
    {
        int hash = 19;
        foreach (var foo in foos)
        {
            // This code assumes the collection does not contain null values
            hash = hash * 31 + foo.GetHashCode();
            // If foo might be null:
            // hash = hash * 31 + (foo?.GetHashCode() ?? 0);
        }
        return hash;
    }
}

(Note that you shouldn't add anything to the list after this has been used for the key in a hash table of any description, as the hash will change. This also assumes that there are no null entries - if there could be, you need to take account of that.)

2024-12-22 12:59:17

首先,仔细检查您是否需要哈希码。您是否要将这些列表放入哈希映射结构(例如字典、哈希集等)中?如果没有,就别想了。

现在,假设您的意思是 EnumerableObject 出于某种原因已经覆盖了 Equals(object)(并且希望因此也实现了 IEquatable),那么这确实是必要的。您想要平衡速度与位分布。

一个好的起点是 mult+add 或 shift+xor ,例如:(

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    foreach(var item in this)
    {
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

这假设您正在使用 item.Equals() 进行序列相等性比较,如果您使用 IEqualityComparer 的 equals,则需要调用其哈希码)。

从那里我们可以进行优化。

如果不允许使用 null 项,请删除 null 检查(请小心,如果确实找到 null,这将使代码抛出异常)。

如果非常大的列表很常见,我们需要减少检查的数量,同时尽量不导致大量冲突。比较以下不同的实现:

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int max = Math.Min(Count, 16);
    for(int i = 0, i != max; ++i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int min = Math.Max(-1, Count - 16);
    for(int i = Count -1, i != min; --i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int step = Count / 16 + 1;
    for(int i = 0, i < Count; i += step)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

每种实现都限制检查的项目总数,这会加快执行速度,但会带来哈希质量较差的风险。哪个(如果有)最好取决于具有相同开始或相同结束的集合是否更有可能。

改变上面的数字16可以调整平衡;越小速度越快,但越高哈希质量越好,哈希冲突的风险越低。

编辑:现在你可以使用我的 SpookyHash v. 2 的实现

public override int GetHashCode()
{
  var hasher = new SpookyHash();//use methods with seeds if you need to prevent HashDos
  foreach(var item in this)
    hasher.Update(item.GetHashCode());//or relevant feeds of item, etc.
  return hasher.Final().GetHashCode();
}

这将创建一个比mult+add 或 shift+xor,同时速度也特别快(特别是在 64 位进程中,因为算法为此进行了优化,尽管它在 32 位上也能很好地工作)。

Firstly, double-check that you need a hashcode at all. Are you going to be putting these lists into a hash-mapped structure (e.g. dictionary, hashset, etc)? If not, forget about it.

Now, assuming that you mean that EnumerableObject already overrides Equals(object) (and hopefully therefore also implements IEquatable<EnumerableObject>) for some reason, then this is indeed necessary. You want to balance speed versus bit distribution.

A good starting point is a mult+add or a shift+xor like:

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    foreach(var item in this)
    {
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

(This assumes that you are using item.Equals() for your sequence equality comparison, if you're using an IEqualityComparer's equals you'll need to call into its hashcode).

From there we can optimise.

If null items are disallowed, remove the null-check (be careful, this will make the code throw if it ever does find a null).

If very large lists are common we need to reduce the number examined, while trying not to result in lots of collisions. Compare the following different implementations:

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int max = Math.Min(Count, 16);
    for(int i = 0, i != max; ++i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int min = Math.Max(-1, Count - 16);
    for(int i = Count -1, i != min; --i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int step = Count / 16 + 1;
    for(int i = 0, i < Count; i += step)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

Each of these restrict the total number of items examined, which speeds execution but risks poorer quality hashes. Which (if any) is best depends on whether collections with the same start or the same end are more likely.

Changing the number 16 above adjusts the balance; smaller is faster but higher is better hash quality with a lower risk of hash collisions.

Edit: And now you can use my implementation of SpookyHash v. 2:

public override int GetHashCode()
{
  var hasher = new SpookyHash();//use methods with seeds if you need to prevent HashDos
  foreach(var item in this)
    hasher.Update(item.GetHashCode());//or relevant feeds of item, etc.
  return hasher.Final().GetHashCode();
}

This will create a much better distribution than mult+add or shift+xor, while also being particularly fast (especially in 64-bit processes as the algorithm is optimised for that, though it works well on 32-bit too).

丿*梦醉红颜 2024-12-22 12:59:17

.GetHashCode() 方法通常只返回基于对象引用(指针地址)的哈希值。这是因为计算可枚举列表中每个项目的哈希码可能非常耗时。我不喜欢覆盖现有的行为,而是更喜欢使用扩展方法,并且仅在需要确定性地确定哈希码的情况下使用它:

public static class EnumerableExtensions
{
    public static int GetSequenceHashCode<TItem>(this IEnumerable<TItem> list)
    {
        if (list == null) return 0;
        const int seedValue = 0x2D2816FE;
        const int primeNumber = 397;
        return list.Aggregate(seedValue, (current, item) => (current * primeNumber) + (Equals(item, default(TItem)) ? 0 : item.GetHashCode()));
    }
}

The .GetHashCode() method usually just returns a hash based on the object reference (pointer address). This is because calculating the hash code of every item in an enumerable list can be very time intensive. Instead of overwriting the existing behaviour, I prefer to use an extension method and use it only where the hash code needs to be deterministically determined:

public static class EnumerableExtensions
{
    public static int GetSequenceHashCode<TItem>(this IEnumerable<TItem> list)
    {
        if (list == null) return 0;
        const int seedValue = 0x2D2816FE;
        const int primeNumber = 397;
        return list.Aggregate(seedValue, (current, item) => (current * primeNumber) + (Equals(item, default(TItem)) ? 0 : item.GetHashCode()));
    }
}
沫雨熙 2024-12-22 12:59:17

我的扩展方法基于 Jon Skeet 答案

#region UTILS
/// <summary>
/// Utils
/// </summary>
internal static class UTILS
{
    #region GetHashCodeByItems
    /// <summary>
    /// Hash code depending on the content and order of the elements of the collection
    /// </summary>
    /// <param name="lst">Collection</param>
    /// <typeparam name="T">The type of items in the collection</typeparam>
    /// <returns>Hash code</returns>
    internal static int GetHashCodeByItems<T>(this IEnumerable<T> lst)
    {
        unchecked
        {
            int hash = 19;
            foreach (T item in lst)
            {
                hash = hash * 31 + (item != null ? item.GetHashCode() : 1);
            }
            return hash;
        }
    }
    #endregion
}
#endregion

My extension method with null handling based on Jon Skeet answer:

#region UTILS
/// <summary>
/// Utils
/// </summary>
internal static class UTILS
{
    #region GetHashCodeByItems
    /// <summary>
    /// Hash code depending on the content and order of the elements of the collection
    /// </summary>
    /// <param name="lst">Collection</param>
    /// <typeparam name="T">The type of items in the collection</typeparam>
    /// <returns>Hash code</returns>
    internal static int GetHashCodeByItems<T>(this IEnumerable<T> lst)
    {
        unchecked
        {
            int hash = 19;
            foreach (T item in lst)
            {
                hash = hash * 31 + (item != null ? item.GetHashCode() : 1);
            }
            return hash;
        }
    }
    #endregion
}
#endregion
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文