当 hashcode() 实现返回常量值时,为什么哈希表会退化为链表?

发布于 2024-12-10 00:43:52 字数 937 浏览 1 评论 0原文

// The worst possible legal hash function - never use!
@Override public int hashCode() { return 42; }

这是合法的,因为它确保相等的对象具有相同的哈希码。它很糟糕,因为它确保每个对象都具有相同的哈希码。因此,每个对象都哈希到同一个桶,哈希表退化为链表。应该以线性时间运行的程序而不是以二次时间运行。

我正在尝试解决上述问题(引自 Joshua Bloch 的《Effective Java》第 47 页第 9 项)。

我的看法如下(考虑以下代码):

Map<String, String> h = new HashMap<String,String>();
h.put("key1", "value1");
h.put("key1", "value2");

第二个 h.put("key1",...) 命令发生的情况如下: 1.获取key1的hashcode 2. 获取代表上述hashcode的bucket 3. 在该桶内,对于每个对象,调用 equals 方法来查找是否存在相同的对象。

这有点快,因为首先您找到对象的“组”(桶),然后找到实际的对象。

现在,当 hashcode 实现为 ALL 对象返回相同的整数(例如上面的 42)时,则只有一个桶,并且 equals 方法需要对整个哈希图/哈希表中的每个对象进行一一调用。这和链表一样糟糕,因为如果对象位于链表中,那么也必须一一遍历它们,比较(调用 equals)每个对象。

有人说,这就是哈希表退化为链表的原因吗?

(我对上述文字的冗长表示歉意。我的概念还不够清晰,无法更简洁地表述)

// The worst possible legal hash function - never use!
@Override public int hashCode() { return 42; }

It’s legal because it ensures that equal objects have the same hash code. It’s atrocious because it ensures that every object has the same hash code. Therefore, every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time instead run in quadratic time.

Am trying to figure the above out (quote from pg 47, Item 9, Joshua Bloch's Effective Java).

The way I see it is as follows (consider the following code):

Map<String, String> h = new HashMap<String,String>();
h.put("key1", "value1");
h.put("key1", "value2");

What happens with the second h.put("key1",...) command is as follows:
1. Get the hashcode of key1
2. Get to the bucket representing the above hashcode
3. Within that bucket, for each object, invoke the equals method to find whether an identical object exists.

This is kind of faster, because first you find the 'group' (bucket) of objects and then the actual object.

Now, when the hashcode implementation is such that it returns the same integer (such as 42 above) for ALL objects, then there is only one bucket, and the equals method needs to be invoked one-by-one on each object in the entire hashmap/hashtable. This is as bad as a linked list because if the objects where in a linked list, then too, one would have to go through them one by one comparing (calling equals) each object.

Is that why, it was said, that the hashtables degenerate into a linked list ?

(I apologize for the verbosity of the above text. I am not clear enough in my concepts to have stated it more succinctly)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

素食主义者 2024-12-17 00:43:52

HashTable是一个带有映射函数(hashCode)的数组。插入数组时,您计算位置并将元素插入其中。

但是,hashCode 不保证每个元素都有不同的位置,因此某些对象可能会发生冲突(具有相同的地址),并且 hashTable 必须解决它。有两种常见的方法,以及如何做到这一点。

单独链接

在单独链接(在 Java 中使用)中,数组的每个索引都包含一个链接列表 - 因此每个桶(位置)都有无限的容量。因此,如果您的 hashCode 仅返回一个值,则您仅使用一个喜欢的列表 => hashTable 是一个链表。

线性探测

第二种方法是线性探测。在线性探测中,内部数组实际上是正常的元素数组。当您发现该位置已被占用时,您将遍历数组并将新元素放置在第一个空位置。

因此,我的 hashCode 实现为每个元素生成常量值,您只生成碰撞,因此您尝试将所有元素放置到同一索引,并且因为总是被占用,所以您迭代所有已放置的元素并放置新元素在此结构的末尾。如果您再次阅读您正在做的事情,您必须看到您仅使用链表的不同(可以说隐式)实现。

为什么不这样做

你确实不应该返回常量值,因为哈希表的构建是为了提供 O(1) 预期的搜索和插入操作复杂性(因为哈希函数,它为(几乎)每个不同的对象返回不同的地址)。如果仅返回一个值,则这两项操作的实现都会降级为链表,且复杂度为 O(n)

HashTable is an array with mapping function (hashCode). When inserting into the array you calculate the position and insert the element there.

BUT, the hashCode does not guarantee, that every element will have a different position, so some objects might collide (have the same address) and the hashTable has to resolve it. There are two common approaches, how to do that.

Separate chaining

In separate chaining (used in Java) every index of the array contains a linked list - so every bucket (position) has a infinite capacity. Hence if your hashCode returns only one value, you are using only one liked list => hashTable is a linked list.

Linear probing

Second approach is a linear probing. In linear probing the inner array is really normal array of elements. When you find out, that the position is already occupied, you iterate over the array and place the new element at the first empty position.

So I your impl of hashCode generates contant value for every element, you are generating only colisions, hence you are trying to place all the elements to the same index and because is always occupied, you iterate over all aready placed elements and place the new element at the end of this structure. If you read again, what you are doing, you must see, that you are using only a different (you can say implicit) implementation of a linked list.

Why not to do it

You really should not return constant values, because hashtables are built to provide O(1) expected complexity of search and insert operations (because of the hash function, which returns a different address for (almost) every different object). If you return only one value, the implementation degrades to linked list with O(n) for both operations.

书信已泛黄 2024-12-17 00:43:52

是的,你的理解似乎是正确的。然而,它不像链接列表。共享公共存储桶的条目的实际内部实现是一个普通的旧链表。存储桶将 Map.Entry 保存在列表的头部,并且每个条目都有一个指向其存储桶的下一个占用者的前向指针。 (当然是为了实现 Java 中内置的 HashMap。)

Yes, your understanding seems accurate. However, it is not like a linked list. The actual internal implementation of entries that share a common bucket is a plain old linked list. The bucket holds the Map.Entry at the head of the list and each entry has a forward pointer to the next occupant of its bucket. (For the implementation of HashMap that's built into Java of course.)

萌能量女王 2024-12-17 00:43:52

哈希表——如果使用得当——平均提供恒定时间的查找。就时间复杂度而言,常数时间就已经是最好的了。

链接列表提供线性时间查找。线性时间(即依次查看每个元素)非常糟糕。

当哈希表按照 Bloch 描述的方式被滥用时,其查找行为退化为链表,仅仅因为它实际上变成链表。

对于其他操作也可以说类似的事情。

Hash tables -- when used correctly -- offer constant-time lookups on average. In terms of time complexity, constant time is as good as it gets.

Linked lists offer linear-time lookups. Linear time (i.e. look at every element in turn) is as bad as it gets.

When a hash table is misused in the manner described by Bloch, its lookup behaviour degenerates into that of a linked list, simply because it effectively becomes a linked list.

Similar things can be said about other operations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文