哈希表桶数组

发布于 2024-11-06 11:38:31 字数 114 浏览 0 评论 0原文

如果我有 50,000 个条目,并且哈希表中有 100,000 个可用槽。 如果不使用 LinkedLists 以使数组永远不会“溢出”,那么为每个索引选择合适的存储桶数组大小的最佳方法是什么?超出30%合适吗?

If I have 50,000 entries and have say, 100,000 slots available in a hash table.
What would be the best way to choose a suitable bucket array size for each index if not using LinkedLists so that the array would never 'overflow'? Would 30% excess be suitable?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

桃扇骨 2024-11-13 11:38:31

如果您先验知道密钥,则可以计算最小完美哈希。因此,如果您知道键并且可以定制哈希函数,则存储桶大小为 1 就足够了。

如果您事先不知道密钥——或者确实知道密钥,但无法更改散列函数——那么对手有可能选择最坏情况的密钥集(即,所有密钥都散列到同一个桶)。为了保证桶不会溢出,桶的大小需要等于桶的数量。如果您愿意容忍溢出的可能性,则可以进行更复杂的分析来选择涵盖大多数情况的存储桶大小。

If you know the keys a priori, you can compute a minimal perfect hash. Therefore, a bucket size of one is sufficient if you know the keys and can tailor the hash function.

If you do not know the keys in advance -- or do know the keys, but do cannot alter the hash function -- then it is possible for an adversary to select a worst case set of keys (i.e., keys that all hash to the same bucket). To guarantee no overflow of buckets then, you would need a bucket size equal to the number of buckets. If you are willing to tolerate the chance of overflow, it may be possible to do more sophisticated analysis to select a bucket size that covers a preponderance of situations.

飘逸的'云 2024-11-13 11:38:31

如果您为存储桶使用固定大小的数组,则没有小于 50,000 的存储桶大小可以保证永远不会溢出,除非您有关于 50,000 中键分布的其他信息(即,如果您知道它们是整数 1 .. 50,000 那么就微不足道了)。

但通常您不想依赖大桶,因为搜索桶的时间复杂度为 O(n)。相反,使用可变大小的表和可变大小的存储桶是一个更好的主意。存储桶可以只是数组,每次填满时,大小都会加倍。同样,每次达到 90% 满时,哈希表的大小可以加倍。这是标准类型的方法。

正如前面的海报所提到的,大多数列表的实现,无论是数组还是链表,都会在列表满时自动为您重新分配存储空间。

If you are using a fixed size array for your buckets, then there is no bucket size less than 50,000 that can guarantee never overflowing unless you have additional information about the distribution of keys in the 50,000 (i.e if you knew that they were the integers 1 .. 50,000 then it would be trivial).

But generally you don't want to rely on large buckets because that is O(n) to search the buckets. Instead it's a better idea to use a variable sized table and variable sized buckets. The buckets can simply be arrays that you double in size each time they get filled. Similiarly the hash table can be doubled in size each time you get 90% full. This is a standard type approach.

As mentioned by the previous posters, most implementations of lists whether by arrays or linked lists automatically reallocate storage for you when the list gets full.

黑色毁心梦 2024-11-13 11:38:31

某些语言支持数组的动态大小(无需声明数组的大小)。
数据动态决定数组的大小。需要该大小的语言也支持动态数组。

Some languages support dynamic size for array (no need to declare the size of the array).
Data decided the size of array dynamically.And the languages which needs the size they also support dynamic array.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文