.NET 字典解决冲突的效果如何?

发布于 2024-08-21 05:04:23 字数 970 浏览 14 评论 0 原文

我遇到了需要为表设置键的自定义对象的问题。我需要生成一个唯一的数字键。我遇到了碰撞问题,我想知道是否可以利用字典来帮助我。假设我有一个像这样的对象:

class Thingy
{
    public string Foo;
    public string Bar;
    public string Others;
}

等等,还有更多字段。假设 Foo 和 Bar 是我的关键字段 - 如果它们在两个 Thingy 之间相等,那么这两个对象应该被认为是相等的(一个可能代表对另一个的更新,而其他字段正在更新。)所以我有这些

public override bool Equals(object obj)
{
    Thingy thing = (Thingy)obj; // yes I do type check first
    return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}

public override int GetHashCode()
{
    return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}

:这在大多数情况下都有效,但在极少数情况下,两个实际上不同的 Thingy 具有相同的哈希码。

我的问题是:我可以使用 Dictionary> 吗?我在哪里放入 Thingys,并使用字典中的顺序值作为我的实际键?我想知道字典在检测到罕见的哈希码冲突时是否会调用我的 Equals 方法,确定对象实际上不同,并以不同的方式存储它们。我想象一下,当查找它时,它会看到该哈希值的存储桶并搜索正确的 Thingy,再次使用 Equals 进行比较。

字典就是这种情况,还是它只解决散列码不同但(散列%大小)相同的冲突?如果这行不通,那还有什么办法呢?

I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:

class Thingy
{
    public string Foo;
    public string Bar;
    public string Others;
}

and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:

public override bool Equals(object obj)
{
    Thingy thing = (Thingy)obj; // yes I do type check first
    return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}

public override int GetHashCode()
{
    return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}

so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.

My question is this: could I use a Dictionary<Thingy, int> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.

Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

把梦留给海 2024-08-28 05:04:23

哈希冲突只影响性能,不影响完整性。

一个简单的测试是将 GetHashCode() 更改为仅返回 1;。您会注意到字典仍然表现正常,但对于任何合理的数据集,它都会表现得很糟糕。

Hash collisions only affect performance, not integrity.

A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.

碍人泪离人颜 2024-08-28 05:04:23

哈希冲突主要会影响性能 - 而不是正确性。只要 Equals() 行为正确。

Dictionary 使用哈希码作为将项目组织到单独的“存储桶”中的方式。如果太多项共享相同的哈希码,您可能会遇到性能问题。但是,只要 Equals() 能够正确区分实例,您就应该得到正确的结果。

哈希码可能导致问题的地方是可变对象如果您的 Thingy 类允许 Foo Bar 更改字典中的某个项目,您可能会在后续访问尝试中找不到它。这是因为现在生成的哈希码与用于在字典中存储值的哈希码不同。

Hash collisions will primarily affect performance - not correctness. So long as Equals() behaves correctly.

Dictionary uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals() can correctly distinguish between instances, you should get correct results.

Where hash codes can result in problems is with mutable objects. If your Thingy class allows Foo or Bar to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.

风吹雪碎 2024-08-28 05:04:23

GetHashCode 设计用于哈希表,需要最大限度地减少冲突,但不能消除冲突。如果您需要生成真正唯一的密钥,GetHashCode 是一个合理的起点(并且不像 guid 那样长),但是您需要将密钥存储为对象的一部分,并单独维护已使用密钥的列表。

虽然您可能能够从 Dictionary 的内部检索看起来可用的内容,但它可能无法可靠地工作 - 例如,如果您添加的项目多于字典最初分配处理的项目,则底层数据结构将被重建并单独项目可能最终出现在字典中完全不同的部分。

GetHashCode is designed for use in hash tables, where collisions need to be minimized but not eliminated. If you need to generate a truly unique key, GetHashCode is a reasonable starting point (and not as excessively long as a guid), but you will need to store the key as part of the object and maintain a list of used keys seperately.

While you may be able to retrieve something that looks usable from the internals of Dictionary, it probably won't work reliably - for example if you add more items than the dictionary was initially allocated to handle, the underlying data structure will get rebuilt and individual items could end up in a completely different part of the dictionary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文