哈希码比较问题

发布于 2024-08-22 17:37:40 字数 856 浏览 5 评论 0原文

我有一个对象的列表,在我们的例子中被称为规则,这个对象本身是一个字段列表,我必须对其进行哈希码比较,因为我们不能在 系统。

即假设我有两条规则 R1 和 R2,其中字段 A 和 R2 为 A 和 R2。 B.

现在如果 A 和 A 的值等于 A 和 A 的值。 R1中的B分别为7和2。

在 R2 中,分别是 3 和 4,然后是我用来检查口是心非的过程 系统中的规则的哈希码比较失败,

我使用的方法是

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

在这里的changeableAttrCodes,我们存储所有规则的哈希码。

所以请建议我更好的方法,以便将来不会出现此类问题以及系统中规则的重复性。

提前致谢

I have list of a an object which is termed as rule in our case, this object itself is a list of field for which I have to do hashcode comparison as we can't duplicate rule in the
system.

i.e Let say I have two Rules R1 and R2 with fields A & B.

Now if values of A & B in R1 are 7 and 2 respectively.

And in R2 it's 3 and 4 respectively then the process I have used to check the duplicity
of Rules in the system that is hashcode comparison fails

the method which I have used is

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

here changeableAttrCodes where we store the hashcode of all the rules.

so can please suggest me better method so that this kind of problem does not arise in future as well as duplicity of rules in system can be seen.

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

欢烬 2024-08-29 17:37:40

hashcode() 并不意味着用于检查相等性。 return 42;hashcode() 的完全有效的实现。为什么不在规则对象中覆盖 equals() (以及 hashcode())并使用它来检查两个规则是否相等?您仍然可以使用哈希码来检查需要调查哪些对象,因为两个 equal() 对象应始终具有相同的哈希码,但这是您可能需要也可能不需要的性能改进,具体取决于在您的系统上。

hashcode() is not meant to be used to check for equality. return 42; is a perfectly valid implementation of hashcode(). Why don't you overwrite equals() (and hashcode() for that matter) in the rules objects and use that to check whether two rules are equal? You could still use the hashcode to check which objects you need to investigate, since two equal() objects should always have the same hashcode, but that is a performance improvement that you may or may not need, depending on your system.

在梵高的星空下 2024-08-29 17:37:40
  • 在 Rule 类中实现 hashCodeequals
  • equals 的实现必须比较其值。

然后使用HashSet并询问if(mySet.contains(newRule))

HashSet + equals实现解决了非-的问题哈希值的唯一性。它使用哈希进行分类和速度,但在末尾使用 equals 来确保具有相同哈希的两个规则是否相同。

有关哈希的更多信息:如果您想手动执行此操作,请使用素数估计,并查看字符串哈希码的 JDK 代码。如果您想进行干净的实现,请尝试检索元素的哈希码,请创建某种整数数组并使用 Arrays.hashCode(int[]) 来获取它们组合的哈希码。

  • Implement hashCode and equals in class Rule.
  • Implementation of equals has to compare its values.

Then use a HashSet<Rule> and ask if(mySet.contains(newRule))

HashSet + equals implementation solves the problem of the non-uniqueness of the hash. It uses hash for classifying and speed but it uses equals at the end to ensure that two Rules with same hash are the same Rule or not.

More on hash: if you want to do it by hand, use the prime number sudggestion, and review the JDK code for string hashcodes. If you want to make a clean implementation try to retrieve the hashcode of the elements, make some kind of array of ints and use Arrays.hashCode(int[]) to get a hashcode for the combination of them.

奢望 2024-08-29 17:37:40

已更新 您的散列算法没有产生良好的散列值 - 它为 (7, 2) 和 (3, 4) 提供了相同的值:

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

它也为 (11, 0), (-1, 6), ...并且可以根据您当前的算法轻松组成无数个相似的等价类。

当然你无法避免冲突——如果你有足够多的实例,哈希冲突是不可避免的。但是,您应该尽量减少碰撞的可能性。好的散列算法致力于将散列值均匀地分布在广泛的值上。实现此目的的典型方法是为包含 n 个独立字段的对象生成哈希值,作为 n 位数字,其基数足以容纳不同的哈希值对于各个领域。

在您的情况下,您应该乘以素数常数,例如 31 (这将是您的数字的基数),而不是与 fieldCounter 相乘。并在结果中添加另一个质数常数,例如 17。这可以让您更好地散布哈希值。 (当然,具体基础取决于您的字段可以采用什么值 - 我没有这方面的信息。)

此外,如果您实现 hashCode,强烈建议您实现 equals同样 - 事实上,您应该使用后者来测试相等性。

这是一篇关于实现hashCode的文章。

Updated Your hashing algorithm is not producing a good spread of hash values - it gives the same value for (7, 2) and (3, 4):

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

It would also give the same value for (11, 0), (-1, 6), ... and one can trivially make up an endless number of similar equivalence classes based on your current algorithm.

Of course you can not avoid collisions - if you have enough instances, hash collision is inevitable. However, you should aim to minimize the chance for collisions. Good hashing algorithms strive to spread hash values equally over a wide range of values. A typical way to achieve this is to generate the hash value for an object containing n independent fields as an n-digit number with a base big enough to hold the different hash values for the individual fields.

In your case, instead of multiplying with fieldCounter you should multiply with a prime constant, e.g. 31 (that would be the base of your number). And add another prime constant to the result, e.g. 17. This gives you a better spread of hash values. (Of course the concrete base depends on what values can your fields take - I have no info about that.)

Also if you implement hashCode, you are strongly advised to implement equals as well - and in fact, you should use the latter to test for equality.

Here is an article about implementing hashCode.

2024-08-29 17:37:40

我不明白你想在这里做什么。对于大多数哈希函数场景,冲突是不可避免的,因为要哈希的对象比可能的哈希值多得多(这是鸽子原理)。

通常情况下,两个不同的对象可能具有相同的哈希值。您不能仅依靠哈希函数来消除重复项。

一些哈希函数在最小化冲突方面比其他函数更好,但这仍然是不可避免的。


也就是说,有一些简单的指导原则通常可以提供足够好的哈希函数。 Joshua Bloch 在他的《Effective Java 2nd Edition》一书中给出了以下内容:

  • 将一些常量非零值(例如 17)存储在名为 resultint 变量中。
  • 计算每个字段的 int 哈希码 c
    • 如果该字段是布尔值,则计算(f ? 1 : 0)
    • 如果字段是byte、char、short、int,则计算(int) f
    • 如果字段为long,则计算(int) (f ^ (f >>> 32))
    • 如果字段是float,则计算Float.floatToIntBits(f)
    • 如果字段是 double,则计算 Double.doubleToLongBits(f),然后对结果 long 进行哈希处理,如上所示。李>
    • 如果该字段是对象引用,并且此类的 equals 方法通过递归调用 equals 来比较该字段,则对该字段递归调用 hashCode 。如果该字段的值为null,则返回0。
    • 如果该字段是一个数组,则将其视为每个元素都是一个单独的字段。如果数组字段中的每个元素都很重要,您可以使用版本 1.5 中添加的 Arrays.hashCode 方法之一。
  • 将哈希码 c 合并为 result,如下所示:result = 31 * result + c;

I don't understand what you are trying to do here. With most hash function scenarios, collision is inevitable, because there are way more objects to hash than there are possible hash values (it's a pigeonhole principle).

It is generally the case that two different objects may have the same hash value. You cannot rely on hash functions alone to eliminate duplicates.

Some hash functions are better than others in minimizing collisions, but it's still an inevitability.


That said, there are some simple guidelines that usually gives a good enough hash function. Joshua Bloch gives the following in his book Effective Java 2nd Edition:

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;
心在旅行 2024-08-29 17:37:40

我开始写道,实现您想要的目标的唯一方法是使用完美哈希

但后来我想到了你说你不能在系统中复制对象的事实。

根据来自helios的发人深省的评论进行编辑:

您的解决方案取决于您在写下“不能重复规则”时的意思。

如果您的意思是字面意义上的您不能,那么可以保证如果只是具有一组特定值的规则的一个实例,那么您的问题就很简单:您可以进行身份​​比较,在这种情况下您可以使用 == 进行身份比较。

另一方面,您的意思是由于某种原因(性能)您不应该,那么您的问题也很简单:只需进行值比较即可。

鉴于您定义问题的方式,在任何情况下您都不应该考虑使用哈希码来代替相等。正如其他人所指出的,哈希码本质上会产生冲突(错误的相等),除非您使用完美哈希解决方案,但在这种情况下您为什么要这样做呢?

I started to write that the only way you can achieve what you want is with Perfect Hashing.

But then I thought about the fact that you said you can't duplicate objects in your system.

Edit based on thought-provoking comment from helios:

Your solution depends on what you meant when you wrote that you "can't duplicate rules".

If you meant that literally you cannot, that there is guaranteed to be only one instance of a rule with a particular set of values, then your problem is trivial: you can do identity comparison, in which case you can do identity comparison using ==.

On the other hand, you meant that you shouldn't for some reason (performance), then your problem is also trivial: just do value comparisons.

Given the way you've defined your problem, under no circumstances should you be considering the use of hashcodes as a substitute for equality. As others have noted, hashcodes by their nature yield collisions (false equality), unless you go to a Perfect Hashing solution, but why would you in this case?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文