Java 字符串上 hashCode() 的一致性

发布于 2024-07-18 02:51:44 字数 630 浏览 3 评论 0原文

Java 字符串的 hashCode 值计算如下 (String.hashCode())：

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

是否存在任何情况（例如 JVM 版本、供应商等）以下表达式的计算结果为 false？

boolean expression = "This is a Java string".hashCode() == 586653468

更新＃1：如果您声称答案是“是的，存在这种情况” - 那么请给出“这是一个Java字符串”.hashCode() != 586653468的具体示例。尝试尽可能具体/具体。

更新#2：我们都知道，依赖 hashCode() 的实现细节通常是不好的。然而，我正在具体讨论 String.hashCode() - 所以请将答案集中在 String.hashCode() 上。 Object.hashCode() 与这个问题的上下文完全无关。

原文

The hashCode value of a Java String is computed as (String.hashCode()):

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Are there any circumstances (say JVM version, vendor, etc.) under which the following expression will evaluate to false?

boolean expression = "This is a Java string".hashCode() == 586653468

Update #1: If you claim that the answer is "yes, there are such circumstances" - then please give a concrete example of when "This is a Java string".hashCode() != 586653468. Try to be as specific/concrete as possible.

Update #2: We all know that relying on the implementation details of hashCode() is bad in general. However, I'm talking specifically about String.hashCode() - so please keep the answer focused to String.hashCode(). Object.hashCode() is totally irrelevant in the context of this question.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷鸟归林 2024-07-25 02:51:44

我可以看到早至 Java 1.2 的文档。

虽然确实一般您不应该依赖保持不变的哈希代码实现，但它现在已记录为 java.lang.String 的行为，因此更改它会很重要违反现有合同。

只要有可能，您就不应该依赖在不同版本等之间保持相同的哈希码 - 但在我看来，java.lang.String 是一种特殊情况，仅仅是因为算法具有已指定...当然，只要您愿意在指定算法之前放弃与版本的兼容性。

回复收藏 0 原文

一百个冬季 2024-07-25 02:51:44

我发现了一些关于 JDK 1.0 和 1.1 以及 >= 1.2 的内容：

在 JDK 1.0.x 和 1.1.x 中，hashCode
用于长字符串的函数
对每第 n 个字符进行采样。这
很好地保证你会拥有
许多字符串散列到相同的
值，从而减慢 Hashtable 的速度
抬头。在 JDK 1.2 中该函数有
得到改进以乘以结果
到目前为止 31，然后添加下一个
按顺序排列的字符。这是一个
慢一点，但更擅长
避免碰撞。
来源：http://mindprod.com/jgloss/hashcode.html

有些不同，因为你看起来需要一个数字：使用 CRC32 或 MD5 代替哈希码怎么样，你就可以开始了 - 无需讨论，也无需担心......

回复收藏 0 原文

丘比特射中我 2024-07-25 02:51:44

您不应依赖哈希码等于特定值。只是它将在同一次执行中返回一致的结果。
API 文档说明如下：

hashCode的通用约定是：
每当在 Java 应用程序执行期间对同一对象多次调用 hashCode 方法时，如果对象的 equals 比较中使用的信息没有被修改，则 hashCode 方法必须始终返回相同的整数。从一个应用程序的一次执行到同一应用程序的另一次执行，该整数不需要保持一致。

编辑
由于 String.hashCode() 的 javadoc 指定了如何计算 String 的哈希码，因此任何违反此规定的行为都将违反公共 API 规范。

回复收藏 0 原文

花想c 2024-07-25 02:51:44

如上所述，一般来说，您不应依赖类的哈希码保持不变。请注意，即使随后在同一虚拟机上运行同一应用程序也可能会产生不同的哈希值。 AFAIK Sun JVM 的哈希函数在每次运行时计算相同的哈希值，但这并不能保证。

请注意，这不是理论上的。 java.lang.String 已更改在 JDK1.2 中（旧的哈希在处理 URL 或文件名等分层字符串时存在问题，因为它倾向于为字符串生成相同的哈希，仅在末尾有所不同）。

java.lang.String 是一种特殊情况，因为它的 hashCode() 算法（现已）已记录，因此您可以依赖它。我仍然认为这是不好的做法。如果您需要一种具有特殊、记录属性的哈希算法，只需编写一个:-)。

回复收藏 0 原文

永言不败 2024-07-25 02:51:44

另一个需要担心的（！）问题是 Java 的早期/晚期版本之间的实现可能会发生变化。我不相信实现细节是一成不变的，因此升级到未来 Java 版本可能会导致问题。

最重要的是，我不会依赖 hashCode() 的实现。

也许您可以突出显示您实际上试图通过使用此机制解决什么问题，这将突出显示更合适的方法。

回复收藏 0 原文

浅蓝的眸勾画不出的柔情 2024-07-25 02:51:44

只是回答你的问题，不继续任何讨论。 Apache Harmony JDK 实现似乎使用了不同的算法，至少看起来完全不同：

Sun JDK

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Apache Harmony

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        }
        hashCode = hash;
    }
    return hashCode;
}

随意检查一下...

Just to answer your question and not to continue any discussions. The Apache Harmony JDK implementation seems to use a different algorithm, at least it looks totally different:

Sun JDK

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Apache Harmony

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        }
        hashCode = hash;
    }
    return hashCode;
}

Feel free to check it yourself...

回复收藏 0 原文

水中月 2024-07-25 02:51:44

如果您担心更改和可能不兼容的虚拟机，只需将现有的哈希码实现复制到您自己的实用程序类中，然后使用它来生成您的哈希码。

回复收藏 0 原文

伊面 2024-07-25 02:51:44

哈希码将根据字符串中字符的 ASCII 值计算。

String 类中的实现如下

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

哈希码中的冲突是不可避免的。例如，字符串“Ea”和“FB”给出与 2236 相同的哈希码

The hashcode will be calculated based on the ASCII values of the characters in the String.

This is the implementation in the String Class is as follows

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

Collisions in hashcode are unavoidable. For example, the strings "Ea" and "FB" give the same hashcode as 2236

回复收藏 0 原文

~没有更多了~