我遇到了需要为表设置键的自定义对象的问题。我需要生成一个唯一的数字键。我遇到了碰撞问题,我想知道是否可以利用字典来帮助我。假设我有一个像这样的对象:
class Thingy
{
public string Foo;
public string Bar;
public string Others;
}
等等,还有更多字段。假设 Foo 和 Bar 是我的关键字段 - 如果它们在两个 Thingy 之间相等,那么这两个对象应该被认为是相等的(一个可能代表对另一个的更新,而其他字段正在更新。)所以我有这些
public override bool Equals(object obj)
{
Thingy thing = (Thingy)obj; // yes I do type check first
return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}
public override int GetHashCode()
{
return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}
:这在大多数情况下都有效,但在极少数情况下,两个实际上不同的 Thingy 具有相同的哈希码。
我的问题是:我可以使用 Dictionary> 吗?我在哪里放入 Thingys,并使用字典中的顺序值作为我的实际键?我想知道字典在检测到罕见的哈希码冲突时是否会调用我的 Equals 方法,确定对象实际上不同,并以不同的方式存储它们。我想象一下,当查找它时,它会看到该哈希值的存储桶并搜索正确的 Thingy,再次使用 Equals 进行比较。
字典就是这种情况,还是它只解决散列码不同但(散列%大小)相同的冲突?如果这行不通,那还有什么办法呢?
I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:
class Thingy
{
public string Foo;
public string Bar;
public string Others;
}
and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:
public override bool Equals(object obj)
{
Thingy thing = (Thingy)obj; // yes I do type check first
return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}
public override int GetHashCode()
{
return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}
so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.
My question is this: could I use a Dictionary<Thingy, int
> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.
Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?
发布评论
评论(3)
哈希冲突只影响性能,不影响完整性。
一个简单的测试是将 GetHashCode() 更改为仅返回 1;。您会注意到字典仍然表现正常,但对于任何合理的数据集,它都会表现得很糟糕。
Hash collisions only affect performance, not integrity.
A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.
哈希冲突主要会影响性能 - 而不是正确性。只要
Equals()
行为正确。Dictionary
使用哈希码作为将项目组织到单独的“存储桶”中的方式。如果太多项共享相同的哈希码,您可能会遇到性能问题。但是,只要Equals()
能够正确区分实例,您就应该得到正确的结果。哈希码可能导致问题的地方是可变对象。如果您的
Thingy
类允许Foo
或Bar
更改字典中的某个项目,您可能会在后续访问尝试中找不到它。这是因为现在生成的哈希码与用于在字典中存储值的哈希码不同。Hash collisions will primarily affect performance - not correctness. So long as
Equals()
behaves correctly.Dictionary
uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long asEquals()
can correctly distinguish between instances, you should get correct results.Where hash codes can result in problems is with mutable objects. If your
Thingy
class allowsFoo
orBar
to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.GetHashCode 设计用于哈希表,需要最大限度地减少冲突,但不能消除冲突。如果您需要生成真正唯一的密钥,GetHashCode 是一个合理的起点(并且不像 guid 那样长),但是您需要将密钥存储为对象的一部分,并单独维护已使用密钥的列表。
虽然您可能能够从 Dictionary 的内部检索看起来可用的内容,但它可能无法可靠地工作 - 例如,如果您添加的项目多于字典最初分配处理的项目,则底层数据结构将被重建并单独项目可能最终出现在字典中完全不同的部分。
GetHashCode is designed for use in hash tables, where collisions need to be minimized but not eliminated. If you need to generate a truly unique key, GetHashCode is a reasonable starting point (and not as excessively long as a guid), but you will need to store the key as part of the object and maintain a list of used keys seperately.
While you may be able to retrieve something that looks usable from the internals of Dictionary, it probably won't work reliably - for example if you add more items than the dictionary was initially allocated to handle, the underlying data structure will get rebuilt and individual items could end up in a completely different part of the dictionary.