如何保证hashCode()和equals()一致?

发布于 2024-07-11 21:19:41 字数 895 浏览 8 评论 0原文

当重写 java.lang.Object 的 equals() 函数时,javadocs 建议:

每当重写 hashCode 方法时,通常都需要重写该方法,以维护 hashCode 方法的通用约定,即相等的对象必须具有相等的哈希码。

hashCode() 方法必须为每个对象返回一个唯一整数(这在基于内存位置比较对象时很容易做到,只需返回该对象的唯一整数地址即可)

应该如何重写 hashCode() 方法,以便它仅根据每个对象的属性返回一个唯一整数


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

When overriding the equals() function of java.lang.Object, the javadocs suggest that,

it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)

How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

北笙凉宸 2024-07-18 21:19:41

它并不是说对象的哈希码必须完全唯一,只是两个相等对象的哈希码返回相同的哈希码。 让两个不相等的对象返回相同的哈希码是完全合法的。 但是,一组对象上的哈希码分布越独特,HashMap 和使用该哈希码的其他操作的性能就越好。

IntelliJ Idea 等 IDE 具有内置的 equals 和 hashCode 生成器,通常可以很好地为大多数对象提供“足够好”的代码(并且可能比一些手工制作的过于聪明的哈希函数更好)。

例如,下面是 Idea 为 People 类生成的 hashCode 函数:

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

For example, here's a hashCode function that Idea generates for your People class:

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}
无可置疑 2024-07-18 21:19:41

我不会详细讨论 hashCode 唯一性,因为 Marc 已经解决了这个问题。 对于您的 People 类,您首先需要确定人的平等意味着什么。 也许平等仅仅基于他们的名字,也许基于名字和年龄。 它将是特定于领域的。 假设平等是基于姓名和年龄的。 您重写的equals 看起来像

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

任何时候您重写equals 时都必须重写hashCode。 此外,hashCode 在计算中不能使用比 equals 更多的字段。 大多数时候,您必须添加或独占或各个字段的哈希码(hashCode 应该能够快速计算)。 因此,有效的 hashCode 方法可能如下所示:

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

请注意,以下内容无效,因为它使用 equals 没有使用的字段(高度) 。 在这种情况下,两个“等于”对象可能具有不同的哈希码。

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

此外,两个不等于的对象具有相同的哈希码是完全有效的:

public int hashCode() {    
    return age;    
}

在这种情况下,Jane 年龄 30 不等于 Bob 年龄 30,但它们的哈希码都是 30。虽然有效,但这对于哈希性能来说是不受欢迎的 -为基础的集合。

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

Also, it's perfectly valid for two non-equals objects to have the same hash code:

public int hashCode() {    
    return age;    
}

In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

百善笑为先 2024-07-18 21:19:41

另一个问题是,是否有一些所有程序员都应该知道的基本低级知识,我认为哈希查找就是其中之一。 所以就这样吧。

哈希表(请注意,我没有使用实际的类名)基本上是一个链表数组。 要在表中查找某些内容,首先计算该内容的哈希码,然后根据表的大小对其进行修改。 这是数组的索引,您可以在该索引处获得一个链接列表。 然后你遍历列表直到找到你的对象。

由于数组检索的复杂度为 O(1),链表遍历的复杂度为 O(n),因此您需要一个哈希函数来创建尽可能随机的分布,以便对象将被哈希到不同的列表中。 每个对象都可以返回值 0 作为其哈希码,并且哈希表仍然可以工作,但它本质上是数组元素 0 处的长链表。

您通常还希望数组很大,这会增加对象位于长度为 1 的列表中的机会。例如,当映射中的条目数 > 1 时,Java HashMap 会增加数组的大小。 数组大小的 75%。 这里有一个权衡:您可以拥有一个条目很少且浪费内存的巨大数组,也可以拥有一个较小的数组,其中数组中的每个元素都是一个包含 > 的列表。 1个条目,浪费时间遍历。 完美的哈希会将每个对象分配到数组中的唯一位置,不会浪费空间。

术语“完美哈希”是一个真实的术语,在某些情况下,您可以创建一个哈希函数,为每个对象提供唯一的编号。 只有当您知道所有可能值的集合时,这才有可能。 一般情况下,你无法实现这一点,并且会有一些值返回相同的哈希码。 这是简单的数学:如果您的字符串长度超过 4 个字节,则无法创建唯一的 4 字节哈希码。

一个有趣的花絮:哈希数组的大小通常基于素数,以便在修改结果时提供最佳的随机分配机会,无论哈希码的实际随机程度如何。

根据评论进行编辑:

1)链表并不是表示具有相同哈希码的对象的唯一方法,尽管这是 JDK 1.5 HashMap 使用的方法。 尽管内存效率比简单数组低,但它在重新散列时确实可以减少流失(因为条目可以从一个存储桶取消链接并重新链接到另一个存储桶)。

2) 从 JDK 1.4 开始,HashMap 类使用大小为 2 的幂的数组; 在此之前它使用 2^N+1,我相信对于 N <= 32 来说这是素数。这本身并不会加速数组索引,但确实允许使用按位 AND 而不是除法来计算数组索引正如尼尔·科菲所指出的。 就我个人而言,我会质疑这是不成熟的优化,但考虑到 HashMap 上的作者列表,我会假设有一些真正的好处。

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.

A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.

Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.

You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.

The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.

One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.

Edit based on comments:

1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).

2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

蝶…霜飞 2024-07-18 21:19:41

一般来说,哈希码不能是唯一的,因为值比可能的哈希码(整数)多。
好的哈希码可以将值很好地分布在整数上。
一个坏的哈希表总是可以给出相同的值并且在逻辑上仍然是正确的,这只会导致哈希表效率低得令人无法接受。

相等的值必须具有相同的哈希值,哈希表才能正常工作。
否则,您可以将一个键添加到哈希表中,然后尝试通过具有不同哈希码的相等值来查找它,但找不到它。
或者,您可以使用不同的哈希码放置一个相等的值,并在哈希表的不同位置放置两个相等的值。

在实践中,您通常选择要在 hashCode() 和 equals() 方法中考虑的字段子集。

In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.

Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.

In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.

初见你 2024-07-18 21:19:41

我认为你误解了它。 哈希码不必对于每个对象都是唯一的(毕竟,它是哈希码),尽管您显然不希望它对于所有对象都相同。 但是,您确实需要它与所有相等的对象相同,否则像标准集合之类的东西将无法工作(例如,您会在哈希集中查找某些内容但找不到它)。

对于简单的属性,一些 IDE 具有哈希码函数生成器。

如果您不使用 IDE,请考虑使用 Apahce Commons 和 HashCodeBuilder 类

I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).

For straightforward attributes, some IDEs have hashcode function builders.

If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder

凶凌 2024-07-18 21:19:41

hashCode 的唯一合同义务是它一致。 创建 hashCode 值时使用的字段必须与 equals 方法中使用的字段相同或者是其子集。 这意味着对所有值返回 0 是有效的,但效率不高。

可以通过单元测试来检查 hashCode 是否一致。 我编写了一个名为 EqualityTestCase 的抽象类,它执行一些 hashCode 检查。 只需扩展测试用例并实现两到三个工厂方法即可。 该测试对 hashCode 是否有效进行了非常粗略的测试。

The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.

One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.

南渊 2024-07-18 21:19:41

这就是文档告诉我们的哈希码方法

@ javadoc

每当它被调用时
期间多次使用同一个对象
Java 应用程序的执行,
hashCode方法必须一致
返回相同的整数,前提是没有
等于比较中使用的信息
对象上的内容被修改。 这
整数不必保持一致
从应用程序的一次执行
到另一个相同的执行
应用程序。

This is what documentation tells us as for hash code method

@ javadoc

Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.

有一个业务密钥的概念,它确定同一类型的单独实例的唯一性。 对来自目标域的单独实体(例如车队系统中的车辆)进行建模的每个特定类型(类)都应该具有一个业务密钥,该业务密钥由一个或多个类字段表示。 equals() 和 hasCode() 方法都应该使用构成业务键的字段来实现。 这确保了两种方法彼此一致。

There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文