java.util.HashMap和HashSet的内部实现
我一直在尝试了解 java.util.HashMap 和 java.util.HashSet 的内部实现。
以下是我脑海中暂时浮现的疑问:
- HashMap/HashSet 中的 @Override public int hashcode() 有何重要性?这个哈希码在内部哪里使用?
- 我通常认为 HashMap 的键是一个
String
,如myMap
。我可以像myMap
那样将值映射到someObject
(而不是字符串)吗?为了成功实现这一目标,我需要遵守哪些合同?
提前致谢 !
编辑:
- 我们是说键的哈希码(检查!)是哈希表中值映射的实际内容吗?当我们执行
myMap.get(someKey);
java 内部调用someKey.hashCode()
来获取哈希表中要查找结果值的数字?
答案:是的。
编辑2:
- 在
java.util.HashSet
中,哈希表的密钥从哪里生成?是否来自我们要添加的对象,例如。mySet.add(myObject);
那么myObject.hashCode()
将决定将其放置在哈希表中的位置? (因为我们不在 HashSet 中给出键)。
答案:添加的对象成为关键。价值是虚拟的!
I have been trying to understand the internal implementation of java.util.HashMap
and java.util.HashSet
.
Following are the doubts popping in my mind for a while:
- Whats is the importance of the
@Override public int hashcode()
in a HashMap/HashSet? Where is this hash code used internally? - I have generally seen the key of the HashMap be a
String
likemyMap<String,Object>
. Can I map the values againstsomeObject
(instead of String) likemyMap<someObject, Object>
? What all contracts do I need to obey for this happen successfully?
Thanks in advance !
EDIT:
- Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? And when we do
myMap.get(someKey);
java is internally callingsomeKey.hashCode()
to get the number in the Hash table to be looked for the resulting value?
Answer: Yes.
EDIT 2:
- In a
java.util.HashSet
, from where is the key generated for the Hash table? Is it from the object that we are adding eg.mySet.add(myObject);
thenmyObject.hashCode()
is going to decide where this is placed in the hash table? (as we don't give keys in a HashSet).
Answer: The object added becomes the key. The value is dummy!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
问题 2 的答案很简单 - 是的,您可以使用任何您喜欢的对象。具有字符串类型键的映射被广泛使用,因为它们是命名服务的典型数据结构。但一般来说,您可以映射任意两种类型,例如
Map
或Map
。对于 hashcode() 方法,就像之前回答的那样 - 每当您重写 equals() 时,您就必须重写 hashcode() 才能遵守契约。另一方面,如果您对 equals() 的标准实现感到满意,那么您不应该接触 hashcode() (因为这可能会破坏契约并导致不相等的对象具有相同的哈希码)。
实用旁注:Eclipse(可能还有其他 IDE)可以根据类成员自动为您的类生成一对 equals() 和 hashcode() 实现。
编辑
对于您的附加问题:是的,完全正确。看一下HashMap.get(Object key);的源码它调用 key.hashcode 来计算内部哈希表中的位置(bin)并返回该位置的值(如果有)。
但要小心“手工制作”的 hashcode/equals 方法 - 如果您使用对象作为键,请确保 hashcode 之后不会更改,否则您将无法再找到映射的值。换句话说,用于计算 equals 和 hashcode 的字段应该是最终的(或者在创建对象后“不可更改”)。
假设我们有一个
String name
和String Phonenumber
的联系人,并且我们使用这两个字段来计算 equals() 和 hashcode()。现在,我们用他的手机号码创建“John Doe”,并将他映射到他最喜欢的甜甜圈店。 hashcode() 用于计算哈希表中的索引(bin),这就是存储甜甜圈店的位置。现在我们得知他有一个新的电话号码,并且我们更改了 John Doe 对象的电话号码字段。这会产生一个新的哈希码。这个哈希码解析为一个新的哈希表索引 - 这通常不是约翰最喜欢的甜甜圈店的存储位置。
问题很明显:在本例中,我们希望将“John Doe”映射到 Donut 商店,而不是“具有特定电话号码的 John Doe”。因此,我们必须小心自动生成的 equals/hashcode,以确保它们是我们真正想要的,因为它们可能会使用不需要的字段,从而给 HashMap 和 HashSet 带来麻烦。
编辑2
如果向 HashSet 添加一个对象,则该对象是内部哈希表的键,该值已设置但未使用(只是对象的静态实例)。这是 openjdk 6 (b17) 的实现:
The answer to question 2 is easy - yes you can use any Object you like. Maps that have String type keys are widely used because they are typical data structures for naming services. But in general, you can map any two types like
Map<Car,Vendor>
orMap<Student,Course>
.For the hashcode() method it's like answered before - whenever you override equals(), then you have to override hashcode() to obey the contract. On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects).
Practical sidenote: eclipse (and probably other IDEs as well) can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members.
Edit
For your additional question: yes, exactly. Look at the source code for HashMap.get(Object key); it calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one).
But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. In other words, the fields you use to calculate equals and hashcode should be final (or 'unchangeable' after creation of the object).
Assume, we have a contact with
String name
andString phonenumber
and we use both fields to calculate equals() and hashcode(). Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored.Now we learn that he has a new phone number and we change the phone number field of the John Doe object. This results in a new hashcode. And this hashcode resolves to a new hash table index - which usually isn't the position where John Does' favorite Donut shop was stored.
The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". So, we have to be careful with autogenerated equals/hashcode to make sure they're what we really want, because they might use unwanted fields, introducing trouble with HashMaps and HashSets.
Edit 2
If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). Here's the implementation from the openjdk 6 (b17):
像
HashMap
和HashSet
这样的散列容器通过将内容分成“存储桶”来提供对存储在其中的元素的快速访问。例如,存储在
List
中的数字列表:1, 2, 3, 4, 5, 6, 7, 8
在内存中看起来(概念上)类似于:[1,2,3,4,5,6,7,8]
。将同一组数字存储在
Set
中看起来更像是这样:[1, 2] [3, 4] [5, 6] [7, 8]
。在此示例中,列表已分为 4 个存储桶。现在假设您想要从
List
和Set
中查找值6
。对于列表,您必须从列表的开头开始检查每个值,直到达到 6,这将需要 6 个步骤。通过一组,您可以找到正确的存储桶,检查该存储桶中的每个项目(在我们的示例中只有 2 个项目),从而使此过程分为 3 个步骤。拥有的数据越多,这种方法的价值就会显着增加。但是等等,我们怎么知道要查看哪个桶呢?这就是
hashCode
方法的用武之地。为了确定在其中查找项目的存储桶,Java 哈希容器调用hashCode
,然后对结果应用一些函数。此函数尝试平衡存储桶的数量和项目的数量,以尽可能实现最快的查找。在查找过程中,一旦找到正确的存储桶,就会像在列表中一样一次比较该存储桶中的每个项目。这就是为什么当您覆盖
hashCode
时,您还必须覆盖equals
。因此,如果任何类型的对象同时具有equals
和hashCode
方法,则它可以用作Map
中的键或一个集合
。为了正确实现这些方法,必须遵循一项合同,该合同的规范文本来自 Josh Bloch 的伟大著作《Effective Java》:第 8 项:重写 equals 时始终重写 hashCodeHashing containers like
HashMap
andHashSet
provide fast access to elements stored in them by splitting their contents into "buckets".For example the list of numbers:
1, 2, 3, 4, 5, 6, 7, 8
stored in aList
would look (conceptually) in memory something like:[1, 2, 3, 4, 5, 6, 7, 8]
.Storing the same set of numbers in a
Set
would look more like this:[1, 2] [3, 4] [5, 6] [7, 8]
. In this example the list has been split into 4 buckets.Now imagine you want to find the value
6
out of both theList
and theSet
. With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. The value of this approach increases dramatically the more data you have.But wait how did we know which bucket to look in? That is where the
hashCode
method comes in. To determine the bucket in which to look for an item Java hashing containers callhashCode
then apply some function to the result. This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible.During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. That is why when you override
hashCode
you must also overrideequals
. So if an object of any type has both anequals
and ahashCode
method it can be used as a key in aMap
or an entry in aSet
. There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals这允许映射的实例根据映射的内容生成有用的哈希代码。具有相同内容的两个映射将产生相同的哈希码。如果内容不同,哈希码就会不同。
绝不。此代码的存在只是为了让您可以将一个地图用作另一个地图中的键。
是的,但是
someObject
必须是一个类,而不是一个对象(你的名字表明你想要传入对象;它应该是SomeObject
以明确你所指的是类型)。该类必须实现
hashCode()
和equals()
。[编辑]
是的。
This allows the instance of the map to produce a useful hash code depending on the content of the map. Two maps with the same content will produce the same hash code. If the content is different, the hash code will be different.
Never. This code only exists so you can use a map as a key in another map.
Yes but
someObject
must be a class, not an object (your name suggests that you want to pass in object; it should beSomeObject
to make it clear you're referring to the type).The class must implement
hashCode()
andequals()
.[EDIT]
Yes.
是的。您可以使用任何对象作为 HashMap 中的键。为此,您必须遵循以下步骤。
覆盖等于。
覆盖哈希码。
java.lang.Object 的文档中非常清楚地提到了这两种方法的约定。 http://java.sun.com/javase/6 /docs/api/java/lang/Object.html
是的,hashCode() 方法由 HashMap 内部使用,因此返回正确的值对于性能非常重要。
这是 HashMap 中的 hashCode() 方法 从
上面的代码可以清楚地看出,每个键的 hashCode 不仅仅用于映射的 hashCode(),而且还用于查找放置键、值对的桶。这就是为什么 hashCode() 与 HashMap 的性能有关
Yes. You can use any object as the key in a HashMap. In order to do so following are the steps you have to follow.
Override equals.
Override hashCode.
The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. http://java.sun.com/javase/6/docs/api/java/lang/Object.html
And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance.
Here is the hashCode() method from HashMap
It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. That is why hashCode() is related to performance of the HashMap
Object
都必须有一个hashCode()
方法;HashMap
和HashSet
也不例外。如果将哈希映射/集合插入另一个哈希映射/集合,则使用此哈希代码。HashMap
/HashSet
中的键。这要求hashCode()
方法为相等的对象返回相等的值,并且equals()
方法根据契约(自反、传递、对称)实现。Object
的默认实现已经遵守这些约定,但如果您想要值相等而不是引用相等,您可能需要覆盖它们。Object
in Java must have ahashCode()
method;HashMap
andHashSet
are no execeptions. This hash code is used if you insert the hash map/set into another hash map/set.HashMap
/HashSet
. This requires that thehashCode()
method returns equal values for equal objects, and that theequals()
method is implemented according to contract (reflexive, transitive, symmetric). The default implementations fromObject
already obey these contracts, but you may want to override them if you want value equality instead of reference equality.Java 中的 equals()、
hashcode()
和哈希表之间通常存在着复杂的关系(就此而言,.NET 也是如此)。引用文档:该行
仅表明
hashCode()
方法是被覆盖。这通常表明在HashMap
中使用该类型作为键是安全的。是的,您可以轻松地使用任何遵守
HashMap
中的equals()
和hashCode()
约定的对象作为键。There is a intricate relationship between equals(),
hashcode()
and hash tables in general in Java (and .NET too, for that matter). To quote from the documentation:The line
just tells that the
hashCode()
method is overridden. This ia usually a sign that it's safe to use the type as key in aHashMap
.And yes, you can aesily use any object which obeys the contract for
equals()
andhashCode()
in aHashMap
as key.在回答问题 2 时,虽然您可以使用任何可用作 HashMap 中的键的类,但最佳实践是使用不可变类作为 HashMap 的键。或者至少如果您的“hashCode”和“equals”实现依赖于类的某些属性,那么您应该注意不要提供更改这些属性的方法。
In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes.
亚伦·迪古拉(Aaron Digulla)是绝对正确的。人们似乎没有意识到的一个有趣的附加说明是,关键对象的 hashCode() 方法没有逐字使用。事实上,它是由 HashMap 重新哈希的,即它调用
hash(someKey.hashCode))
,其中hash()
是一种内部哈希方法。要了解这一点,请查看源代码: http://kickjava.com /src/java/util/HashMap.java.htm
原因是有些人对 hashCode() 的实现很差,而 hash() 函数给出了更好的哈希分布。这基本上是出于性能原因而完成的。
Aaron Digulla is absolutely correct. An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. It is, in fact, rehashed by the HashMap i.e. it calls
hash(someKey.hashCode))
, wherehash()
is an internal hashing method.To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm
The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. It's basically done for performance reasons.
HashCode 方法适用于 HashSet、HashTable、HashMap 等集合类 - 哈希代码返回支持哈希目的的对象的整数。它是通过将对象的内部地址转换为整数来实现的。在每个重写 equals 方法的类中都应该重写哈希码方法。
HashCode 方法的三个一般联系
对于两个相等的对象按照。到 equal 方法,然后为两个对象调用 HashCode,它应该生成相同的整数值。
如果为单个对象多次调用它,那么它应该返回常量整数值。
如果为
对于两个不相等的对象,根据。到 equal 方法,然后为两个对象调用 HashCode 方法,不强制它应该产生不同的值。
HashCode method for collection classes like HashSet, HashTable, HashMap etc – Hash code returns integer number for the object that is being supported for the purpose of hashing. It is implemented by converting internal address of the object into an integer. Hash code method should be overridden in every class that overrides equals method.
Three general contact for HashCode method
For two equal objects acc. to equal method, then calling HashCode for both object it should produce same integer value.
If it is being called several times for a single object, then it should return constant integer value.
For two unequal objects acc. to equal method, then calling HashCode method for both object, it is not mandatory that it should produce distinct value.