如何保证hashCode()和equals()一致？

发布于 2024-07-11 21:19:41 字数 895 浏览 8 评论 0原文

当重写 java.lang.Object 的 equals() 函数时，javadocs 建议：

每当重写 hashCode 方法时，通常都需要重写该方法，以维护 hashCode 方法的通用约定，即相等的对象必须具有相等的哈希码。

hashCode() 方法必须为每个对象返回一个唯一整数（这在基于内存位置比较对象时很容易做到，只需返回该对象的唯一整数地址即可)

应该如何重写 hashCode() 方法，以便它仅根据每个对象的属性返回一个唯一整数？


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

原文

When overriding the equals() function of java.lang.Object, the javadocs suggest that,

it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)

How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北笙凉宸 2024-07-18 21:19:41

它并不是说对象的哈希码必须完全唯一，只是两个相等对象的哈希码返回相同的哈希码。让两个不相等的对象返回相同的哈希码是完全合法的。但是，一组对象上的哈希码分布越独特，HashMap 和使用该哈希码的其他操作的性能就越好。

IntelliJ Idea 等 IDE 具有内置的 equals 和 hashCode 生成器，通常可以很好地为大多数对象提供“足够好”的代码（并且可能比一些手工制作的过于聪明的哈希函数更好）。

例如，下面是 Idea 为 People 类生成的 hashCode 函数：

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

For example, here's a hashCode function that Idea generates for your People class:

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}

回复收藏 0 原文

无可置疑 2024-07-18 21:19:41

我不会详细讨论 hashCode 唯一性，因为 Marc 已经解决了这个问题。对于您的 People 类，您首先需要确定人的平等意味着什么。也许平等仅仅基于他们的名字，也许基于名字和年龄。它将是特定于领域的。假设平等是基于姓名和年龄的。您重写的equals 看起来像

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

任何时候您重写equals 时都必须重写hashCode。此外，hashCode 在计算中不能使用比 equals 更多的字段。大多数时候，您必须添加或独占或各个字段的哈希码（hashCode 应该能够快速计算）。因此，有效的 hashCode 方法可能如下所示：

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

请注意，以下内容无效，因为它使用 equals 没有使用的字段（高度）。在这种情况下，两个“等于”对象可能具有不同的哈希码。

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

此外，两个不等于的对象具有相同的哈希码是完全有效的：

public int hashCode() {    
    return age;    
}

在这种情况下，Jane 年龄 30 不等于 Bob 年龄 30，但它们的哈希码都是 30。虽然有效，但这对于哈希性能来说是不受欢迎的 -为基础的集合。

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

Also, it's perfectly valid for two non-equals objects to have the same hash code:

public int hashCode() {    
    return age;    
}

In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

回复收藏 0 原文

百善笑为先 2024-07-18 21:19:41

另一个问题是，是否有一些所有程序员都应该知道的基本低级知识，我认为哈希查找就是其中之一。所以就这样吧。

哈希表（请注意，我没有使用实际的类名）基本上是一个链表数组。要在表中查找某些内容，首先计算该内容的哈希码，然后根据表的大小对其进行修改。这是数组的索引，您可以在该索引处获得一个链接列表。然后你遍历列表直到找到你的对象。

由于数组检索的复杂度为 O(1)，链表遍历的复杂度为 O(n)，因此您需要一个哈希函数来创建尽可能随机的分布，以便对象将被哈希到不同的列表中。每个对象都可以返回值 0 作为其哈希码，并且哈希表仍然可以工作，但它本质上是数组元素 0 处的长链表。

您通常还希望数组很大，这会增加对象位于长度为 1 的列表中的机会。例如，当映射中的条目数 > 1 时，Java HashMap 会增加数组的大小。数组大小的 75%。这里有一个权衡：您可以拥有一个条目很少且浪费内存的巨大数组，也可以拥有一个较小的数组，其中数组中的每个元素都是一个包含 > 的列表。 1个条目，浪费时间遍历。完美的哈希会将每个对象分配到数组中的唯一位置，不会浪费空间。

术语“完美哈希”是一个真实的术语，在某些情况下，您可以创建一个哈希函数，为每个对象提供唯一的编号。只有当您知道所有可能值的集合时，这才有可能。一般情况下，你无法实现这一点，并且会有一些值返回相同的哈希码。这是简单的数学：如果您的字符串长度超过 4 个字节，则无法创建唯一的 4 字节哈希码。

一个有趣的花絮：哈希数组的大小通常基于素数，以便在修改结果时提供最佳的随机分配机会，无论哈希码的实际随机程度如何。

根据评论进行编辑：

1）链表并不是表示具有相同哈希码的对象的唯一方法，尽管这是 JDK 1.5 HashMap 使用的方法。尽管内存效率比简单数组低，但它在重新散列时确实可以减少流失（因为条目可以从一个存储桶取消链接并重新链接到另一个存储桶）。

2) 从 JDK 1.4 开始，HashMap 类使用大小为 2 的幂的数组；在此之前它使用 2^N+1，我相信对于 N <= 32 来说这是素数。这本身并不会加速数组索引，但确实允许使用按位 AND 而不是除法来计算数组索引正如尼尔·科菲所指出的。就我个人而言，我会质疑这是不成熟的优化，但考虑到 HashMap 上的作者列表，我会假设有一些真正的好处。

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.

A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.

Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.

You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.

The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.

One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.

Edit based on comments:

1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).

2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

回复收藏 0 原文