为什么Hashtable的负载因子与CLRS书中描述的不一致？

发布于 2025-01-06 19:41:24 字数 567 浏览 0 评论 0原文

从 Java 关于 Hashtable 类的文档中，它说

作为一般规则，默认负载系数 (.75) 在时间和空间成本之间提供了良好的权衡

因此 Hashtable 的负载因子为 0.75，这意味着如果有 N 个键，Hashtable 将使用 M = N/0.75 空间来存储他们。

在CLRS书中，还介绍了负载因子alpha。

但据我了解，CLRS有意将alpha设置为大于1，即M = N/alpha < N。这意味着哈希表可以使用 M 个槽，其中 M < N 这样就可以节省未使用的密钥的存储空间。

我说M < N 可以节省存储空间，因为通常我们不知道 N 的确切值，但我们知道键的集合并使用 N 代表可能的键数。按键的集合可能很大，但实际使用的按键数量却很少。因此设置M小于N可以节省存储空间。这也是为什么 Hashtable 通常不使用直接数组来映射每个 {key, value} 1:1 的原因。

但是Java中的Hashtable使用的存储超过了N。我认为这与CLRS的设计不符，对吗？

我说得对吗？

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

絕版丫頭 2025-01-13 19:41:24

那么，负载因子应该大于添加的元素。除以小于 1 的数，结果会比最初的数大。

假设您要添加 100 个元素，您可以编写以下任一内容：

AllocationSize = 100 / 0.75; // Your formula: M = N/0.75

或

AllocationSize = 100 * 1.33333333; // M = N / X -> M = N * (1/X)

两者的结果均为 133.333333 -> 133。

整个 JavaDoc：

Hashtable 的实例有两个影响其值的参数
性能：初始容量和负载系数。容量为
哈希表中桶的个数，初始容量为
只是创建哈希表时的容量。注意
哈希表是开放的：在“哈希冲突”的情况下，单个
Bucket存储多个条目，必须按顺序查找。
负载因子是哈希表允许的填充程度的度量
在其容量自动增加之前获取。当数量
哈希表中的条目超过了负载因子的乘积，并且
当前容量，通过调用rehash增加容量
方法。
通常，默认负载因子 (0.75) 提供了一个很好的权衡
时间成本和空间成本之间。较高的值会减少空间
开销，但会增加查找条目的时间成本（即
反映在大多数 Hashtable 操作中，包括 get 和 put）。
初始容量控制浪费空间和
需要重新哈希操作，这非常耗时。没有重复
如果初始容量大于
哈希表将包含的最大条目数除以其
负载系数。然而，将初始容量设置得太高会浪费
空间。
如果要将许多条目放入哈希表中，请使用
足够大的容量可以允许插入更多的条目
比让它根据需要执行自动重新哈希更有效
扩大表格。

Well, the load factor should be larger than the elements added. Division by a number less than one results in a larger number than the initial one.

Assuming you want to add 100 elements you can write either:

AllocationSize = 100 / 0.75; // Your formula: M = N/0.75

AllocationSize = 100 * 1.33333333; // M = N / X -> M = N * (1/X)

where both results in 133.333333 -> 133.

The whole JavaDoc:

An instance of Hashtable has two parameters that affect its
performance: initial capacity and load factor. The capacity is the
number of buckets in the hash table, and the initial capacity is
simply the capacity at the time the hash table is created. Note that
the hash table is open: in the case of a "hash collision", a single
bucket stores multiple entries, which must be searched sequentially.
The load factor is a measure of how full the hash table is allowed to
get before its capacity is automatically increased. When the number of
entries in the hashtable exceeds the product of the load factor and
the current capacity, the capacity is increased by calling the rehash
method.
Generally, the default load factor (.75) offers a good tradeoff
between time and space costs. Higher values decrease the space
overhead but increase the time cost to look up an entry (which is
reflected in most Hashtable operations, including get and put).
The initial capacity controls a tradeoff between wasted space and the
need for rehash operations, which are time-consuming. No rehash
operations will ever occur if the initial capacity is greater than the
maximum number of entries the Hashtable will contain divided by its
load factor. However, setting the initial capacity too high can waste
space.
If many entries are to be made into a Hashtable, creating it with a
sufficiently large capacity may allow the entries to be inserted more
efficiently than letting it perform automatic rehashing as needed to
grow the table.