将对象的哈希码定义为所有类变量哈希码的和、乘法等是否不正确?
假设我有以下类:
class ABC {
private int myInt = 1;
private double myDouble = 2;
private String myString = "123";
private SomeRandomClass1 myRandomClass1 = new ...
private SomeRandomClass2 myRandomClass2 = new ...
//pseudo code
public int myHashCode() {
return 37 *
myInt.hashcode() *
myDouble.hashCode() *
... *
myRandomClass.hashcode()
}
}
这是 hashCode 的正确实现吗?这不是我通常这样做的方式(我倾向于遵循Effective Java 的指导方针),但我总是忍不住做一些类似上面代码的事情。
谢谢
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这取决于你所说的“正确”是什么意思。假设您使用所有相关
equals()
定义字段的hashCode()
,那么是的,它是“正确的”。然而,这样的公式可能不会有良好的分布,因此可能会比其他公式引起更多的冲突,这将对性能产生不利影响。这是来自 Effective Java 2nd Edition 第 9 条的引用:当您覆盖
equals
时,始终覆盖hashCode
评估您提出的哈希函数的好坏可能不需要大量的数学能力,但为什么还要麻烦呢?为什么不遵循那些已经被实践证明足够的东西呢?
Josh Bloch 的秘诀
result
的int
变量中。int
哈希码c
:布尔值
,则计算(f ? 1 : 0)
byte、char、short、int
,则计算(int) f
long
,则计算(int) (f ^ (f >>> 32))
float
,则计算Float.floatToIntBits(f)
double
,则计算Double.doubleToLongBits(f)
,然后对结果long
进行哈希处理,如上所示。李>equals
方法通过递归调用equals
来比较该字段,则对该字段递归调用hashCode
。如果该字段的值为null
,则返回0。c
组合到result
中,如下所示:result = 31 * result + c;
现在,这个配方当然相当复杂,但幸运的是,您不必每次都重新实现它,这要归功于
java.util.Arrays.hashCode(Object[])
(和com.google.common.base.Objects< /code>
提供了一个方便的 vararg 变体)。
另请参见
对象。 hashCode()
<块引用>
不要求如果两个对象根据
equals(java.lang.Object)
方法不相等,则调用hashCode
两个对象中的每个对象上的方法必须产生不同的整数结果。 但是,程序员应该意识到,为不相等的对象生成不同的整数结果可能会提高哈希表的性能。It depends what you mean by "correct". Assuming that you're using the
hashCode()
of all the relevantequals()
-defining fields, then yes, it's "correct". However, such formulas probably will not have a good distribution, and therefore would likely cause more collisions than otherwise, which will have a detrimental effect on performance.Here's a quote from Effective Java 2nd Edition, Item 9: Always override
hashCode
when you overrideequals
It may not require a lot of mathematical power to evaluate how good your proposed hash function is, but why even bother? Why not just follow something that has been anecdotally proven to be adequate in practice?
Josh Bloch's recipe
int
variable calledresult
.int
hashcodec
for each field:boolean
, compute(f ? 1 : 0)
byte, char, short, int
, compute(int) f
long
, compute(int) (f ^ (f >>> 32))
float
, computeFloat.floatToIntBits(f)
double
, computeDouble.doubleToLongBits(f)
, then hash the resultinglong
as in above.equals
method compares the field by recursively invokingequals
, recursively invokehashCode
on the field. If the value of the field isnull
, return 0.Arrays.hashCode
methods added in release 1.5.c
intoresult
as follows:result = 31 * result + c;
Now, of course that recipe is rather complicated, but luckily, you don't have to reimplement it every time, thanks to
java.util.Arrays.hashCode(Object[])
(andcom.google.common.base.Objects
provides a convenient vararg variant).See also
Object.hashCode()
做这种事情是合同允许的。但返回
1
也是如此。 HotSpot 中有一个编译时标志,始终为身份哈希值返回1
。然而,这样的选择会产生较差的性能。乘法有一个特殊的问题。组件中的 0 哈希值不仅会消除该值,而且 2 的幂也会逐渐将较低位归零。
交换运算符存在值重新排列会导致冲突的问题。
如果组件的哈希值之间存在特定的关系,那么加法就会特别糟糕。例如,
(4, 6)
和(2, 8)
发生冲突。Doing this sort of things is allowable by the contract. But so is always returning
1
. There is a compile time flag in HotSpot to always return1
for the identity hash vale. Such choices will, however, produce poor performance.There is a particular problem with multiplication. Not only will a 0 hash value from a component annihilate the value, but powers of two will progressively zero the lower bits.
Commutative operators have the problem that rearrangements of values will cause a clash.
If there is a particular relationship between hash values of components, then addition will be particularly bad.
(4, 6)
and(2, 8)
clashing, for instance.不,但实际上这几乎肯定不是一个好主意。最重要的是,您不允许修改哈希码中使用的任何字段。它们都必须保持不变。
如果您修改一个对象,可能会发生这种情况:将对象插入到 HashSet 中,更改字段,然后测试该对象是否在 HashSet 中。虽然它在那里,但由于哈希码改变了,HashSet将找不到它!
No, but in practice it is almost certainly not a good idea. Most importantly you are not allowed to modify any of the fields that you use in the hash code. They all have to be constant.
If you modify one, this can happen: You insert the objecy in a HashSet, you change a fields, and then test if the object is in the HashSet. Although it is in there, due to the fact that the hash code changed, HashSet will not find it!
我可能来得太晚了,但是……
你的散列函数的问题可以用乘法交换律来描述。 - 可汗学院链接
即 - a * b * c == c * b * a。
因此,仅通过相乘就消除了顺序的唯一性。这意味着您更有可能发生碰撞。
I'm probably much too late to this party, but.....
The problem with your hash function can be described by the commutative law of multiplication. - khan academy link
That is - a * b * c == c * b * a.
So by only multiplying you remove the uniqueness of the order. This means you are much more likely to get collisions.
在我看来,除非你能保证乘积是素数,否则你可能会在对象的结果哈希码之间发生冲突(尽管可能很少见)
Seems to me that unless you can guarantee that the product is a prime number you might get collision (though probably rare) between resulting hashcodes for an object