当前位置：文江博客话题详情

当我们多次运行应用程序时获取对象的唯一且相同的哈希码的算法

发布于 2024-10-02 05:37:12 字数 235 浏览 0 评论 0原文

我正在使用 Java。我想知道，当我多次运行应用程序时，是否有任何算法可以为我提供唯一且相同的哈希代码，从而避免哈希代码的冲突。

我知道对于相似的对象，jvm 返回相同的哈希码，而对于不同的对象，它可能返回相同或不同的哈希码。Bt 我想要一些逻辑来帮助为每个对象生成唯一的哈希码。

唯一意味着一个对象的哈希码不应与任何其他对象的哈希码发生冲突。同样意味着当我多次运行应用程序时，它应该返回相同的哈希码，无论它之前返回给我什么

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

榆西 2024-10-09 05:37:13

Java 中的默认哈希码函数可能会为每个 JVM 调用返回不同的哈希码，因为它能够使用对象的内存地址、修改它并返回它。

然而，这不是好的编码实践，因为相等的对象应该始终返回相同的哈希码！请阅读哈希代码合约以了解更多信息。 Java 中的大多数类已经实现了哈希码函数，该函数在每次 JVM 调用时返回相同的值。

为了简单起见：所有可能存储在某个集合中的数据保存对象都应该具有 equals 和 hashcode 实现。如果您使用 Eclipse 或任何其他合理的 IDE 进行编码，则可以使用自动创建函数的向导。

当我们这样做时：恕我直言，实施 Comparable接口，因此您也可以使用 SortedSets 和 TreeMaps 中的对象。

当我们这样做时：如果其他人应该使用您的对象，请不要忘记可序列化和可克隆。

回复收藏 0 原文

久隐师 2024-10-09 05:37:13

唯一意味着一个对象的哈希码不应与任何其他对象的哈希码冲突。同样意味着当我多次运行应用程序时，它应该返回相同的哈希代码，无论它之前返回给我什么。

由于多种原因，不可能满足这些要求：

无法保证哈希码是唯一的。无论您在类的 hashcode 方法中做什么，其他一些类的 hashcode 方法可能会为某些实例提供与您的实例之一的 hashcode 相同的值。
不可能保证哈希码在应用程序运行中是唯一的，即使只是对于您的类的实例也是如此。
不可能

第二个需要理由。创建唯一哈希码的方法是执行以下操作：

    static HashSet<Integer> usedCodes = ...
    static IdentityHashMap<YourClass, Integer> codeMap = ...

    public int hashcode() {
        Integer code = codeMap.get(this);
        if (code == null) {
            code = // generate value-based hashcode for 'this'
            while (usedCode.contains(code)) {
                code = rehash(code);
            }
            usedCodes.add(code);
            codeMap.put(this, code);
        }
        return code;
    }

这为哈希码提供了所需的唯一性属性，但不能保证相同性属性......除非应用程序始终以相同的顺序生成/访问所有对象的哈希码。

使其发挥作用的唯一方法是以合适的形式保留 usedCode 和 codeMap 数据结构。即使（仅）将唯一的哈希码存储为持久对象的一部分也是不够的，因为存在应用程序可能在读取具有哈希码的现有对象之前向新创建的对象重新发出哈希码的风险。

最后，应该注意的是，在解决方案中的任何位置使用身份哈希码时必须小心。身份哈希码在应用程序的不同运行中并不唯一。事实上，如果任何输入存在差异，或者存在任何不确定性，则每次运行应用程序时给定对象很可能具有不同的身份哈希码值。

跟进

假设您在数据库中存储了数百万个网址。在检索这些网址时，我想生成唯一的哈希码，以便加快搜索速度。

您需要将哈希码存储在表的单独列中。但考虑到上面讨论的限制，我不知道这将如何使搜索更快。基本上，您必须在数据库中搜索 URL 才能计算出其唯一的哈希码。

我认为你最好使用不唯一且概率很小的哈希码。如果您使用足够好的“加密”散列函数和足够大的散列大小，您可以（理论上）使冲突概率任意小......但不是零。

Unique means that hashcode of one object should not collide with any other object's hashcode. Same means when I run the application multiple times, it should return me the same hash code whatever it returned me previously.

It is impossible to meet these requirements for a number of reasons:

It is not possible to guarantee that hashcodes are unique. Whatever you do in your classes hashcode method, some other classes hashcode method may give a value for some instance that is the same as the hashcode of one of your instances.
It is impossible to guarantee that hashcodes are unique across application runs even just for instances of your class.

The second requires justification. The way to create a unique hashcode is to do something like this:

    static HashSet<Integer> usedCodes = ...
    static IdentityHashMap<YourClass, Integer> codeMap = ...

    public int hashcode() {
        Integer code = codeMap.get(this);
        if (code == null) {
            code = // generate value-based hashcode for 'this'
            while (usedCode.contains(code)) {
                code = rehash(code);
            }
            usedCodes.add(code);
            codeMap.put(this, code);
        }
        return code;
    }

This gives the hashcodes with the desired uniqueness property, but the sameness property is not guaranteed ... unless the application always generates / accesses the hashcodes for all objects in the same order.

The only way to get this to work would be to persist the usedCode and codeMap data structures in a suitable form. Even (just) storing the unique hashcodes as part of the persisted objects is not sufficient, because there is a risk that the application may reissue a hashcode to a newly created object before reading the existing object that has the hashcode.

Finally, it should be noted that you have to be careful with using identity hashcodes anywhere in the solution. Identity hashcodes are not unique across different runs of an application. Indeed, if there are differences in any inputs, or if there is any non-determinism, it is highly likely that a given object will have a different identity hashcode value each time you run the application.

FOLLOW UP

Suppose you are storing millions of urls in database. While retrieving these urls, I want to generate unique hashcode that will make searching faster.

You need to store the hashcodes in a separate column of the table. But given the constraints discussed above, I don't see how this is going to make search faster. Basically you have to search the database for the URL in order to work out its unique hashcode.

I think you are better off using hashcodes that are not unique with a small probability. If you use a good enough "cryptographic" hashing function and a large enough hash size you can (in theory) make the probability of collision arbitrarily small ... but not zero.

回复收藏 0 原文