为什么Hashtable的负载因子与CLRS书中描述的不一致?

发布于 2025-01-06 19:41:24 字数 567 浏览 0 评论 0原文

从 Java 关于 Hashtable 类的文档中,它说

作为一般规则,默认负载系数 (.75) 在时间和空间成本之间提供了良好的权衡

因此 Hashtable 的负载因子为 0.75,这意味着如果有 N 个键,Hashtable 将使用 M = N/0.75 空间来存储他们。

在CLRS书中,还介绍了负载因子alpha。

但据我了解,CLRS有意将alpha设置为大于1,即M = N/alpha < N。这意味着哈希表可以使用 M 个槽,其中 M < N 这样就可以节省未使用的密钥的存储空间。

我说M < N 可以节省存储空间,因为通常我们不知道 N 的确切值,但我们知道键的集合并使用 N 代表可能的键数。按键的集合可能很大,但实际使用的按键数量却很少。因此设置M小于N可以节省存储空间。这也是为什么 Hashtable 通常不使用直接数组来映射每个 {key, value} 1:1 的原因。

但是Java中的Hashtable使用的存储超过了N。我认为这与CLRS的设计不符,对吗?

我说得对吗?

谢谢

From the doc of Java about Hashtable class, it says

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs

So the load factor for Hashtable is 0.75, which means if there are N keys, Hashtable will use M = N/0.75 spaces to store them.

In CLRS book, it also introduces load factor alpha.

But from my understanding, CLRS intends to set alpha larger than 1, i.e., M = N/alpha < N. This means a Hashtable can use M slots where M < N so that it can save storage from unused keys.

I say M < N can save storage because normally we don't know exactly value for N, but we know the set of the keys and use N to stand for possible number of keys. The set of the keys may be very big, but the number of actual keys in use is very small. So setting M smaller than N can save the storage. Also this is why normally Hashtable does not use a direct array to map every {key, value} 1:1.

But Hashtable in Java use storage more than N. I think it is not consistent with the CLRS design, right?

Am I right?

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

絕版丫頭 2025-01-13 19:41:24

那么,负载因子应该大于添加的元素。除以小于 1 的数,结果会比最初的数大。

假设您要添加 100 个元素,您可以编写以下任一内容:

AllocationSize = 100 / 0.75; // Your formula: M = N/0.75 

AllocationSize = 100 * 1.33333333; // M = N / X -> M = N * (1/X)

两者的结果均为 133.333333 -> 133

整个 JavaDoc:

Hashtable 的实例有两个影响其值的参数
性能:初始容量和负载系数。容量为
哈希表中桶的个数,初始容量为
只是创建哈希表时的容量。注意
哈希表是开放的:在“哈希冲突”的情况下,单个
Bucket存储多个条目,必须按顺序查找。
负载因子是哈希表允许的填充程度的度量
在其容量自动增加之前获取。当数量
哈希表中的条目超过了负载因子的乘积,并且
当前容量,通过调用rehash增加容量
方法。

通常,默认负载因子 (0.75) 提供了一个很好的权衡
时间成本和空间成本之间。较高的值会减少空间
开销,但会增加查找条目的时间成本(即
反映在大多数 Hashtable 操作中,包括 get 和 put)。

初始容量控制浪费空间和
需要重新哈希操作,这非常耗时。没有重复
如果初始容量大于
哈希表将包含的最大条目数除以其
负载系数。然而,将初始容量设置得太高会浪费
空间。

如果要将许多条目放入哈希表中,请使用
足够大的容量可以允许插入更多的条目
比让它根据需要执行自动重新哈希更有效
扩大表格。

Well, the load factor should be larger than the elements added. Division by a number less than one results in a larger number than the initial one.

Assuming you want to add 100 elements you can write either:

AllocationSize = 100 / 0.75; // Your formula: M = N/0.75 

or

AllocationSize = 100 * 1.33333333; // M = N / X -> M = N * (1/X)

where both results in 133.333333 -> 133.

The whole JavaDoc:

An instance of Hashtable has two parameters that affect its
performance: initial capacity and load factor. The capacity is the
number of buckets in the hash table, and the initial capacity is
simply the capacity at the time the hash table is created. Note that
the hash table is open: in the case of a "hash collision", a single
bucket stores multiple entries, which must be searched sequentially.
The load factor is a measure of how full the hash table is allowed to
get before its capacity is automatically increased. When the number of
entries in the hashtable exceeds the product of the load factor and
the current capacity, the capacity is increased by calling the rehash
method.

Generally, the default load factor (.75) offers a good tradeoff
between time and space costs. Higher values decrease the space
overhead but increase the time cost to look up an entry (which is
reflected in most Hashtable operations, including get and put).

The initial capacity controls a tradeoff between wasted space and the
need for rehash operations, which are time-consuming. No rehash
operations will ever occur if the initial capacity is greater than the
maximum number of entries the Hashtable will contain divided by its
load factor. However, setting the initial capacity too high can waste
space.

If many entries are to be made into a Hashtable, creating it with a
sufficiently large capacity may allow the entries to be inserted more
efficiently than letting it perform automatic rehashing as needed to
grow the table.

书间行客 2025-01-13 19:41:24

我没有听说过 CLRS 这本书,但我可以告诉您,使用大于 0.75 左右的负载因子(对于某些哈希映射设计甚至更小)会导致大量冲突。

如果允许 HashMap 或 Hashtable 自然增长,其容量将与条目数成正比。与 Entry 对象的大小(通常为 16 - 24 字节)相比,这些引用很小(通常为 4 字节),因此您关心的哈希索引表将始终比 Entry 对象的大小小几倍,更不用说键了和价值观。

I have not heard of the CLRS book, but I can tell you using a load factor of more then about 0.75 (even less for some hash map designs) leads to a significant number of collisions.

If you allow the HashMap or Hashtable to grow naturally, its capacity will be proportional to the number of entries. These references are small (typically 4 bytes) compared to the size of Entry objects (typically 16 - 24 bytes) so the hash index table you are concerned about will always be several times smaller than the size of the Entry objects, let alone the keys and values.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文