C# .NET performance hash-code-uniqueness

复杂对象图的快速哈希码

发布于 2024-08-17 15:16:24 字数 881 浏览 11 评论 0原文

我有一个非常复杂的对象，我需要获得这些对象的唯一性。一种解决方案可以通过重写 GetHashCode() 来完成。我已经实现了如下所示的代码：

public override int GetHashCode()
{
    return this._complexObject1.GetHashCode() ^
           this._complexObject2.GetHashCode() ^
           this._complexObject3.GetHashCode() ^
           this._complexObject4.GetHashCode() ^
           this._complexObject5.GetHashCode() ^
           this._complexObject6.GetHashCode() ^
           this._complexObject7.GetHashCode() ^
           this._complexObject8.GetHashCode();
}

这些复杂的对象还覆盖 GetHashCode() 并执行类似的操作。

我的项目要求我经常处理这些对象的唯一性，并且内部数据也会以各种方式和地点变化 /强>。

我需要一种更快的方法来找到这些复杂对象的唯一性，这需要考虑性能和内存。

提前致谢
穆尼姆

原文

I have a pretty complex object and I need to get uniqueness of these objects. One solution can be done by overriding GetHashCode(). I have implemented a code noted below:

public override int GetHashCode()
{
    return this._complexObject1.GetHashCode() ^
           this._complexObject2.GetHashCode() ^
           this._complexObject3.GetHashCode() ^
           this._complexObject4.GetHashCode() ^
           this._complexObject5.GetHashCode() ^
           this._complexObject6.GetHashCode() ^
           this._complexObject7.GetHashCode() ^
           this._complexObject8.GetHashCode();
}

These complex objects also overrides GetHashCode() and does similar operations.

My project requires uniqueness of these objects which I deal with these very frequently, and data inside also changes in various ways and places.

I need a faster way to find uniqueness of these complex objects, which need to consider performance and memory.

Thanks in advance
Munim

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

相思碎 2024-08-24 15:16:24

根据您的评论，听起来您可能正在尝试依靠 GetHashCode 来确定唯一性。不要那样做。哈希值并不意味着是唯一的 - 它意味着两个不相等的对象不太可能会哈希到相同的值，但并非不可能。如果您尝试检查一组对象是否没有重复项，则您还必须使用 Equals。

请注意，对哈希码使用 XOR 可能会使您更有可能遇到哈希冲突，具体取决于所涉及的各个哈希值。特别是，它使任意两个相等的场“相互抵消”。我通常使用这种形式：

int hash = 17;
hash = hash * 31 + field1.GetHashCode();
hash = hash * 31 + field2.GetHashCode();
hash = hash * 31 + field3.GetHashCode();
hash = hash * 31 + field4.GetHashCode();
...
return hash;

...但即便如此，这肯定不能保证唯一性。您应该使用 GetHashCode() 来排除相等性，然后使用 Equals 检查任何可能相等的值的实际相等性。

现在你的问题提到了速度 - 这听起来像是使用分析器和一些基准测试的完美场所。你确定这是一个瓶颈吗？如果您有许多不同的类型都在计算哈希值，您是否发现其中哪一个是导致问题的最大因素？

一些优化将取决于您如何使用数据。如果您发现大量时间花费在重新计算您知道未更改的值的哈希值上，则可以缓存哈希代码...尽管当存在本身引用复杂对象的字段时，这显然会变得更加棘手。您可以缓存“叶节点”哈希值，特别是如果这些叶节点不经常更改（但它们的用法可能会有所不同）。

Given your comment, it sounds like you may be trying to rely on GetHashCode on its own to determine uniqueness. Don't do that. Hashes aren't meant to be unique - it's meant to be unlikely that two unequal objects will hash to the same value, but not impossible. If you're trying to check that a set of objects has no duplicates, you will have to use Equals as well.

Note that using XOR for a hashcode can make it more likely that you'll get hash collisions, depending on the individual hash values involved. In particular, it makes any two equal fields "cancel each other out". I generally use this form:

int hash = 17;
hash = hash * 31 + field1.GetHashCode();
hash = hash * 31 + field2.GetHashCode();
hash = hash * 31 + field3.GetHashCode();
hash = hash * 31 + field4.GetHashCode();
...
return hash;

... but even so, that's certainly not going to guarantee uniqueness. You should use GetHashCode() to rule out equality, and then use Equals to check the actual equality of any potentially equal values.

Now your question mentions speed - this sounds like the perfect place to use a profiler and some benchmark tests. Are you sure this is a bottleneck? If you have many different types all computing hash values, have you found out which of these is the biggest contributor to the problem?

Some optimisations will depend on exactly how you use the data. If you find that a lot of your time is spent recomputing hashes for values which you know haven't changed, you could cache the hash code... although this obviously becomes trickier when there are fields which themselves refer to complex objects. It's possible that you could cache "leaf node" hashes, particularly if those leaf nodes don't change often (but their usage could vary).

回复收藏 0 原文

~没有更多了~