哈希表桶数组
如果我有 50,000 个条目,并且哈希表中有 100,000 个可用槽。 如果不使用 LinkedLists 以使数组永远不会“溢出”,那么为每个索引选择合适的存储桶数组大小的最佳方法是什么?超出30%合适吗?
If I have 50,000 entries and have say, 100,000 slots available in a hash table.
What would be the best way to choose a suitable bucket array size for each index if not using LinkedLists so that the array would never 'overflow'? Would 30% excess be suitable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您先验知道密钥,则可以计算最小完美哈希。因此,如果您知道键并且可以定制哈希函数,则存储桶大小为 1 就足够了。
如果您事先不知道密钥——或者确实知道密钥,但无法更改散列函数——那么对手有可能选择最坏情况的密钥集(即,所有密钥都散列到同一个桶)。为了保证桶不会溢出,桶的大小需要等于桶的数量。如果您愿意容忍溢出的可能性,则可以进行更复杂的分析来选择涵盖大多数情况的存储桶大小。
If you know the keys a priori, you can compute a minimal perfect hash. Therefore, a bucket size of one is sufficient if you know the keys and can tailor the hash function.
If you do not know the keys in advance -- or do know the keys, but do cannot alter the hash function -- then it is possible for an adversary to select a worst case set of keys (i.e., keys that all hash to the same bucket). To guarantee no overflow of buckets then, you would need a bucket size equal to the number of buckets. If you are willing to tolerate the chance of overflow, it may be possible to do more sophisticated analysis to select a bucket size that covers a preponderance of situations.
如果您为存储桶使用固定大小的数组,则没有小于 50,000 的存储桶大小可以保证永远不会溢出,除非您有关于 50,000 中键分布的其他信息(即,如果您知道它们是整数 1 .. 50,000 那么就微不足道了)。
但通常您不想依赖大桶,因为搜索桶的时间复杂度为 O(n)。相反,使用可变大小的表和可变大小的存储桶是一个更好的主意。存储桶可以只是数组,每次填满时,大小都会加倍。同样,每次达到 90% 满时,哈希表的大小可以加倍。这是标准类型的方法。
正如前面的海报所提到的,大多数列表的实现,无论是数组还是链表,都会在列表满时自动为您重新分配存储空间。
If you are using a fixed size array for your buckets, then there is no bucket size less than 50,000 that can guarantee never overflowing unless you have additional information about the distribution of keys in the 50,000 (i.e if you knew that they were the integers 1 .. 50,000 then it would be trivial).
But generally you don't want to rely on large buckets because that is O(n) to search the buckets. Instead it's a better idea to use a variable sized table and variable sized buckets. The buckets can simply be arrays that you double in size each time they get filled. Similiarly the hash table can be doubled in size each time you get 90% full. This is a standard type approach.
As mentioned by the previous posters, most implementations of lists whether by arrays or linked lists automatically reallocate storage for you when the list gets full.
某些语言支持数组的动态大小(无需声明数组的大小)。
数据动态决定数组的大小。需要该大小的语言也支持动态数组。
Some languages support dynamic size for array (no need to declare the size of the array).
Data decided the size of array dynamically.And the languages which needs the size they also support dynamic array.