将对象的哈希码定义为所有类变量哈希码的和、乘法等是否不正确?

发布于 2024-08-30 21:37:26 字数 567 浏览 2 评论 0 原文

假设我有以下类:

class ABC {
    private int myInt = 1;
    private double myDouble = 2;
    private String myString = "123";
    private SomeRandomClass1 myRandomClass1 = new ...
    private SomeRandomClass2 myRandomClass2 = new ...

    //pseudo code
    public int myHashCode() {
        return 37 *
               myInt.hashcode() *
               myDouble.hashCode() *
               ... *
               myRandomClass.hashcode()
    }
}

这是 hashCode 的正确实现吗?这不是我通常这样做的方式(我倾向于遵循Effective Java 的指导方针),但我总是忍不住做一些类似上面代码的事情。

谢谢

Let's say I have the following class:

class ABC {
    private int myInt = 1;
    private double myDouble = 2;
    private String myString = "123";
    private SomeRandomClass1 myRandomClass1 = new ...
    private SomeRandomClass2 myRandomClass2 = new ...

    //pseudo code
    public int myHashCode() {
        return 37 *
               myInt.hashcode() *
               myDouble.hashCode() *
               ... *
               myRandomClass.hashcode()
    }
}

Would this be a correct implementation of hashCode? This is not how I usually do it(I tend to follow Effective Java's guide-lines) but I always have the temptation to just do something like the above code.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

那些过往 2024-09-06 21:37:26

这取决于你所说的“正确”是什么意思。假设您使用所有相关 equals() 定义字段的 hashCode(),那么是的,它是“正确的”。然而,这样的公式可能不会有良好的分布,因此可能会比其他公式引起更多的冲突,这将对性能产生不利影响。

这是来自 Effective Java 2nd Edition 第 9 条的引用:当您覆盖 equals 时,始终覆盖 hashCode

虽然本项中的配方产生了相当好的散列函数,但它并没有产生最先进的散列函数,并且 Java 平台库自版本 1.6 起也不提供此类散列函数。编写这样的哈希函数是一个研究课题,最好留给数学家和计算机科学家。 [...尽管如此,]本项中描述的技术对于大多数应用程序来说应该足够了。

评估您提出的哈希函数的好坏可能不需要大量的数学能力,但为什么还要麻烦呢?为什么不遵循那些已经被实践证明足够的东西呢?

Josh Bloch 的秘诀

  • 将一些常量非零值(例如 17)存储在名为 resultint 变量中。
  • 计算每个字段的 int 哈希码 c
    • 如果该字段是布尔值,则计算(f ? 1 : 0)
    • 如果字段是byte、char、short、int,则计算(int) f
    • 如果字段为long,则计算(int) (f ^ (f >>> 32))
    • 如果字段是float,则计算Float.floatToIntBits(f)
    • 如果字段是 double,则计算 Double.doubleToLongBits(f),然后对结果 long 进行哈希处理,如上所示。李>
    • 如果该字段是对象引用,并且此类的 equals 方法通过递归调用 equals 来比较该字段,则对该字段递归调用 hashCode 。如果该字段的值为null,则返回0。
    • 如果该字段是一个数组,则将其视为每个元素都是一个单独的字段。如果数组字段中的每个元素都很重要,您可以使用版本 1.5 中添加的 Arrays.hashCode 方法之一。
  • 将哈希码 c 组合到 result 中,如下所示: result = 31 * result + c;

现在,这个配方当然相当复杂,但幸运的是,您不必每次都重新实现它,这要归功于 java.util.Arrays.hashCode(Object[]) (和 com.google.common.base.Objects< /code> 提供了一个方便的 vararg 变体)。

@Override public int hashCode() {
    return Arrays.hashCode(new Object[] {
           myInt,    //auto-boxed
           myDouble, //auto-boxed
           myRandomClass,
    });
}

另请参见

  • 对象。 hashCode()

    <块引用>

    不要求如果两个对象根据equals(java.lang.Object)方法不相等,则调用hashCode 两个对象中的每个对象上的方法必须产生不同的整数结果。 但是,程序员应该意识到,为不相等的对象生成不同的整数结果可能会提高哈希表的性能。

It depends what you mean by "correct". Assuming that you're using the hashCode() of all the relevant equals()-defining fields, then yes, it's "correct". However, such formulas probably will not have a good distribution, and therefore would likely cause more collisions than otherwise, which will have a detrimental effect on performance.

Here's a quote from Effective Java 2nd Edition, Item 9: Always override hashCode when you override equals

While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and computer scientists. [...Nonetheless,] the techniques described in this item should be adequate for most applications.

It may not require a lot of mathematical power to evaluate how good your proposed hash function is, but why even bother? Why not just follow something that has been anecdotally proven to be adequate in practice?

Josh Bloch's recipe

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;

Now, of course that recipe is rather complicated, but luckily, you don't have to reimplement it every time, thanks to java.util.Arrays.hashCode(Object[]) (and com.google.common.base.Objects provides a convenient vararg variant).

@Override public int hashCode() {
    return Arrays.hashCode(new Object[] {
           myInt,    //auto-boxed
           myDouble, //auto-boxed
           myRandomClass,
    });
}

See also

  • Object.hashCode()

    It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

何止钟意 2024-09-06 21:37:26

做这种事情是合同允许的。但返回 1 也是如此。 HotSpot 中有一个编译时标志,始终为身份哈希值返回 1。然而,这样的选择会产生较差的性能。

乘法有一个特殊的问题。组件中的 0 哈希值不仅会消除该值,而且 2 的幂也会逐渐将较低位归零。

交换运算符存在值重新排列会导致冲突的问题。

如果组件的哈希值之间存在特定的关系,那么加法就会特别糟糕。例如,(4, 6)(2, 8) 发生冲突。

Doing this sort of things is allowable by the contract. But so is always returning 1. There is a compile time flag in HotSpot to always return 1 for the identity hash vale. Such choices will, however, produce poor performance.

There is a particular problem with multiplication. Not only will a 0 hash value from a component annihilate the value, but powers of two will progressively zero the lower bits.

Commutative operators have the problem that rearrangements of values will cause a clash.

If there is a particular relationship between hash values of components, then addition will be particularly bad. (4, 6) and (2, 8) clashing, for instance.

┊风居住的梦幻卍 2024-09-06 21:37:26

不,但实际上这几乎肯定不是一个好主意。最重要的是,您不允许修改哈希码中使用的任何字段。它们都必须保持不变。

如果您修改一个对象,可能会发生这种情况:将对象插入到 HashSet 中,更改字段,然后测试该对象是否在 HashSet 中。虽然它在那里,但由于哈希码改变了,HashSet将找不到它!

No, but in practice it is almost certainly not a good idea. Most importantly you are not allowed to modify any of the fields that you use in the hash code. They all have to be constant.

If you modify one, this can happen: You insert the objecy in a HashSet, you change a fields, and then test if the object is in the HashSet. Although it is in there, due to the fact that the hash code changed, HashSet will not find it!

一桥轻雨一伞开 2024-09-06 21:37:26

我可能来得太晚了,但是……

你的散列函数的问题可以用乘法交换律来描述。 - 可汗学院链接

即 - a * b * c == c * b * a。

因此,仅通过相乘就消除了顺序的唯一性。这意味着您更有可能发生碰撞。

I'm probably much too late to this party, but.....

The problem with your hash function can be described by the commutative law of multiplication. - khan academy link

That is - a * b * c == c * b * a.

So by only multiplying you remove the uniqueness of the order. This means you are much more likely to get collisions.

完美的未来在梦里 2024-09-06 21:37:26

在我看来,除非你能保证乘积是素数,否则你可能会在对象的结果哈希码之间发生冲突(尽管可能很少见)

Seems to me that unless you can guarantee that the product is a prime number you might get collision (though probably rare) between resulting hashcodes for an object

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文