带有整数键的哈希表(字典等)
我已经为此困惑了几天......请随意推翻我的任何假设。
我们使用带有整数键的字典。我假设在这种情况下键的值直接用作哈希。这是否意味着(如果密钥在一个小范围内分组)密钥哈希的分布(与密钥本身相同,对吗?)将在类似的小范围内,因此对于哈希表来说是一个糟糕的选择?
提供一个 IEqualityComparer 来巧妙地利用素数和模数学来计算更好的分布式哈希会更好吗?
I've been puzzling over this for a few days... feel free to shoot down any of my assumptions.
We're using a Dictionary with integer keys. I assume that the value of the key in this case is used directly as the hash. Does this mean (if the keys are grouped over a small range) that the distribution of the key hash (same as the key itself, right?) will be in a similarly small range, and therefore a bad choice for a hashtable?
Would it be better to provide an IEqualityComparer that did something clever with primes and modulo mathematics to calculate a better distributed hash?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它不直接使用,因为字典仍然会向键询问其哈希值 - 但
Int32
的哈希值只是值,所以你的问题的主旨是相关的,是的。我相信 .NET 字典的工作方式并不依赖于均匀分布的哈希值。它需要
hash %bucketCount
,其中bucketCount
始终为素数。 (不过,这是凭记忆 - 我可能是错的。)当然,如果它们碰巧由存储桶计数间隔开,您仍然可能会得到一组低效的键。但情况总是如此 - 如果所有键都有唯一的哈希值并且表维护了一组每个可能的散列的存储桶:)实际上这往往不是问题。如果您碰巧知道这将是一个问题,那么是的,自定义的 IEqualityComparer可以提供帮助。
It's not used directly in that the dictionary will still ask the key for its hash - but the hash value of an
Int32
is just the value, so the thrust of your question is relevant, yes.I believe that the way the .NET dictionary works doesn't rely on hash values being uniformly distributed. It takes
hash % bucketCount
wherebucketCount
is always prime. (That's from memory though - I could be wrong.)You could still end up with an inefficient set of keys of course, if they happen to be spaced by the bucket count. That will always be the case though - a hash table would only ever be genuinely O(1) for all keys if they had unique hash values and the table maintained a set of buckets for every possible hash :) In reality it tends not to be a problem. If you happen to know that it will be a problem, then yes, a custom
IEqualityComparer<T>
could help.在做一些聪明的事情之前,我会按原样测试它的速度,看看它是否适合你。如果不是,那就尝试一下聪明的办法。但我认为最好不要管它;更重要的是哈希值不发生冲突,只要发生冲突,生活就会很好。
Before doing something clever I'd test the speed of it as-is, and see if it's suitable for you. If it isn't, then try the clever thing. But I would expect it's better to leave it alone; it's more important that the hashes don't collide, and as long as that's happening, life will be fine.
假设您正在使用标准库哈希表实现,即使键是整数,键也可能不是哈希,这正是您指出的原因。
因此,虽然您关于哈希分布的逻辑是正确的,但您最初假设整数键意味着哈希=键可能不是正确的。
如果我错了回复:.NET 那么哦,好吧;这更像是一个普遍的答案。 :)
Assuming you're using a standard library hash table implementation, chances are the key is not the hash, even if the key is an integer, for exactly the reason that you point out.
So while your logic regarding hash distributions is correct, your initial assumption that integer keys would mean that hashes = keys is probably not.
If I'm wrong re: .NET then oh well; this is more of a generalized answer. :)