哈希码比较问题

发布于 2024-08-22 17:37:40 字数 856 浏览 5 评论 0原文

我有一个对象的列表，在我们的例子中被称为规则，这个对象本身是一个字段列表，我必须对其进行哈希码比较，因为我们不能在系统。

即假设我有两条规则 R1 和 R2，其中字段 A 和 R2 为 A 和 R2。 B.

现在如果 A 和 A 的值等于 A 和 A 的值。 R1中的B分别为7和2。

在 R2 中，分别是 3 和 4，然后是我用来检查口是心非的过程系统中的规则的哈希码比较失败，

我使用的方法是

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

在这里的changeableAttrCodes，我们存储所有规则的哈希码。

所以请建议我更好的方法，以便将来不会出现此类问题以及系统中规则的重复性。

提前致谢

原文

I have list of a an object which is termed as rule in our case, this object itself is a list of field for which I have to do hashcode comparison as we can't duplicate rule in the
system.

i.e Let say I have two Rules R1 and R2 with fields A & B.

Now if values of A & B in R1 are 7 and 2 respectively.

And in R2 it's 3 and 4 respectively then the process I have used to check the duplicity
of Rules in the system that is hashcode comparison fails

the method which I have used is

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

here changeableAttrCodes where we store the hashcode of all the rules.

so can please suggest me better method so that this kind of problem does not arise in future as well as duplicity of rules in system can be seen.

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

欢烬 2024-08-29 17:37:40

hashcode() 并不意味着用于检查相等性。 return 42; 是 hashcode() 的完全有效的实现。为什么不在规则对象中覆盖 equals() （以及 hashcode()）并使用它来检查两个规则是否相等？您仍然可以使用哈希码来检查需要调查哪些对象，因为两个 equal() 对象应始终具有相同的哈希码，但这是您可能需要也可能不需要的性能改进，具体取决于在您的系统上。

回复收藏 0 原文

在梵高的星空下 2024-08-29 17:37:40

在 Rule 类中实现 hashCode 和 equals。
equals 的实现必须比较其值。

然后使用HashSet并询问if(mySet.contains(newRule))

HashSet + equals实现解决了非-的问题哈希值的唯一性。它使用哈希进行分类和速度，但在末尾使用 equals 来确保具有相同哈希的两个规则是否相同。

有关哈希的更多信息：如果您想手动执行此操作，请使用素数估计，并查看字符串哈希码的 JDK 代码。如果您想进行干净的实现，请尝试检索元素的哈希码，请创建某种整数数组并使用 Arrays.hashCode(int[]) 来获取它们组合的哈希码。

回复收藏 0 原文

奢望 2024-08-29 17:37:40

已更新 您的散列算法没有产生良好的散列值 - 它为 (7, 2) 和 (3, 4) 提供了相同的值：

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

它也为 (11, 0), (-1, 6), ...并且可以根据您当前的算法轻松组成无数个相似的等价类。

当然你无法避免冲突——如果你有足够多的实例，哈希冲突是不可避免的。但是，您应该尽量减少碰撞的可能性。好的散列算法致力于将散列值均匀地分布在广泛的值上。实现此目的的典型方法是为包含 n 个独立字段的对象生成哈希值，作为 n 位数字，其基数足以容纳不同的哈希值对于各个领域。

在您的情况下，您应该乘以素数常数，例如 31 （这将是您的数字的基数），而不是与 fieldCounter 相乘。并在结果中添加另一个质数常数，例如 17。这可以让您更好地散布哈希值。（当然，具体基础取决于您的字段可以采用什么值 - 我没有这方面的信息。）

此外，如果您实现 hashCode，强烈建议您实现 equals同样 - 事实上，您应该使用后者来测试相等性。

这是一篇关于实现hashCode的文章。

Updated Your hashing algorithm is not producing a good spread of hash values - it gives the same value for (7, 2) and (3, 4):

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

It would also give the same value for (11, 0), (-1, 6), ... and one can trivially make up an endless number of similar equivalence classes based on your current algorithm.

Of course you can not avoid collisions - if you have enough instances, hash collision is inevitable. However, you should aim to minimize the chance for collisions. Good hashing algorithms strive to spread hash values equally over a wide range of values. A typical way to achieve this is to generate the hash value for an object containing n independent fields as an n-digit number with a base big enough to hold the different hash values for the individual fields.

In your case, instead of multiplying with fieldCounter you should multiply with a prime constant, e.g. 31 (that would be the base of your number). And add another prime constant to the result, e.g. 17. This gives you a better spread of hash values. (Of course the concrete base depends on what values can your fields take - I have no info about that.)

Also if you implement hashCode, you are strongly advised to implement equals as well - and in fact, you should use the latter to test for equality.

Here is an article about implementing hashCode.

回复收藏 0 原文

哑 2024-08-29 17:37:40

我不明白你想在这里做什么。对于大多数哈希函数场景，冲突是不可避免的，因为要哈希的对象比可能的哈希值多得多（这是鸽子原理）。

通常情况下，两个不同的对象可能具有相同的哈希值。您不能仅依靠哈希函数来消除重复项。

一些哈希函数在最小化冲突方面比其他函数更好，但这仍然是不可避免的。

也就是说，有一些简单的指导原则通常可以提供足够好的哈希函数。 Joshua Bloch 在他的《Effective Java 2nd Edition》一书中给出了以下内容：

将一些常量非零值（例如 17）存储在名为 result 的 int 变量中。
计算每个字段的 int 哈希码 c：
- 如果该字段是布尔值，则计算(f ? 1 : 0)
- 如果字段是byte、char、short、int，则计算(int) f
- 如果字段为long，则计算(int) (f ^ (f >>> 32))
- 如果字段是float，则计算Float.floatToIntBits(f)
- 如果字段是 double，则计算 Double.doubleToLongBits(f)，然后对结果 long 进行哈希处理，如上所示。李>
- 如果该字段是对象引用，并且此类的 equals 方法通过递归调用 equals 来比较该字段，则对该字段递归调用 hashCode 。如果该字段的值为null，则返回0。
- 如果该字段是一个数组，则将其视为每个元素都是一个单独的字段。如果数组字段中的每个元素都很重要，您可以使用版本 1.5 中添加的 Arrays.hashCode 方法之一。
将哈希码 c 合并为 result，如下所示：result = 31 * result + c;