Java 字符串上 hashCode() 的一致性

发布于 2024-07-18 02:51:44 字数 630 浏览 3 评论 0原文

Java 字符串的 hashCode 值计算如下 (String.hashCode()):

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

是否存在任何情况(例如 JVM 版本、供应商等)以下表达式的计算结果为 false?

boolean expression = "This is a Java string".hashCode() == 586653468

更新#1:如果您声称答案是“是的,存在这种情况” - 那么请给出“这是一个Java字符串”.hashCode() != 586653468的具体示例。尝试尽可能具体/具体。

更新#2:我们都知道,依赖 hashCode() 的实现细节通常是不好的。 然而,我正在具体讨论 String.hashCode() - 所以请将答案集中在 String.hashCode() 上。 Object.hashCode() 与这个问题的上下文完全无关。

The hashCode value of a Java String is computed as (String.hashCode()):

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Are there any circumstances (say JVM version, vendor, etc.) under which the following expression will evaluate to false?

boolean expression = "This is a Java string".hashCode() == 586653468

Update #1: If you claim that the answer is "yes, there are such circumstances" - then please give a concrete example of when "This is a Java string".hashCode() != 586653468. Try to be as specific/concrete as possible.

Update #2: We all know that relying on the implementation details of hashCode() is bad in general. However, I'm talking specifically about String.hashCode() - so please keep the answer focused to String.hashCode(). Object.hashCode() is totally irrelevant in the context of this question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

迷鸟归林 2024-07-25 02:51:44

我可以看到早至 Java 1.2 的文档。

虽然确实一般您不应该依赖保持不变的哈希代码实现,但它现在已记录为 java.lang.String 的行为,因此更改它会很重要违反现有合同。

只要有可能,您就不应该依赖在不同版本等之间保持相同的哈希码 - 但在我看来,java.lang.String 是一种特殊情况,仅仅是因为算法具有已指定...当然,只要您愿意在指定算法之前放弃与版本的兼容性。

I can see that documentation as far back as Java 1.2.

While it's true that in general you shouldn't rely on a hash code implementation remaining the same, it's now documented behaviour for java.lang.String, so changing it would count as breaking existing contracts.

Wherever possible, you shouldn't rely on hash codes staying the same across versions etc - but in my mind java.lang.String is a special case simply because the algorithm has been specified... so long as you're willing to abandon compatibility with releases before the algorithm was specified, of course.

一百个冬季 2024-07-25 02:51:44

我发现了一些关于 JDK 1.0 和 1.1 以及 >= 1.2 的内容:

在 JDK 1.0.x 和 1.1.x 中,hashCode
用于长字符串的函数
对每第 n 个字符进行采样。 这
很好地保证你会拥有
许多字符串散列到相同的
值,从而减慢 Hashtable 的速度
抬头。 在 JDK 1.2 中该函数有
得到改进以乘以结果
到目前为止 31,然后添加下一个
按顺序排列的字符。 这是一个
慢一点,但更擅长
避免碰撞。
来源:http://mindprod.com/jgloss/hashcode.html

有些不同,因为你看起来需要一个数字:使用 CRC32 或 MD5 代替哈希码怎么样,你就可以开始了 - 无需讨论,也无需担心......

I found something about JDK 1.0 and 1.1 and >= 1.2:

In JDK 1.0.x and 1.1.x the hashCode
function for long Strings worked by
sampling every nth character. This
pretty well guaranteed you would have
many Strings hashing to the same
value, thus slowing down Hashtable
lookup. In JDK 1.2 the function has
been improved to multiply the result
so far by 31 then add the next
character in sequence. This is a
little slower, but is much better at
avoiding collisions.
Source: http://mindprod.com/jgloss/hashcode.html

Something different, because you seem to need a number: How about using CRC32 or MD5 instead of hashcode and you are good to go - no discussions and no worries at all...

丘比特射中我 2024-07-25 02:51:44

您不应依赖哈希码等于特定值。 只是它将在同一次执行中返回一致的结果。
API 文档说明如下:

hashCode的通用约定是:

  • 每当在 Java 应用程序执行期间对同一对象多次调用 hashCode 方法时,如果对象的 equals 比较中使用的信息没有被修改,则 hashCode 方法必须始终返回相同的整数。 从一个应用程序的一次执行到同一应用程序的另一次执行,该整数不需要保持一致。

编辑
由于 String.hashCode() 的 javadoc 指定了如何计算 String 的哈希码,因此任何违反此规定的行为都将违反公共 API 规范。

You should not rely on a hash code being equal to a specific value. Just that it will return consistent results within the same execution.
The API docs say the following :

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

EDIT
Since the javadoc for String.hashCode() specifies how a String's hash code is computed, any violation of this would violate the public API specification.

花想c 2024-07-25 02:51:44

如上所述,一般来说,您不应依赖类的哈希码保持不变。 请注意,即使随后在同一虚拟机上运行同一应用程序也可能会产生不同的哈希值。 AFAIK Sun JVM 的哈希函数在每次运行时计算相同的哈希值,但这并不能保证。

请注意,这不是理论上的。 java.lang.String 已更改 在 JDK1.2 中(旧的哈希在处理 URL 或文件名等分层字符串时存在问题,因为它倾向于为字符串生成相同的哈希,仅在末尾有所不同)。

java.lang.String 是一种特殊情况,因为它的 hashCode() 算法(现已)已记录,因此您可以依赖它。 我仍然认为这是不好的做法。 如果您需要一种具有特殊、记录属性的哈希算法,只需编写一个:-)。

As said above, in general you should not rely on the hash code of a class remaining the same. Note that even subsequent runs of the same application on the same VM may produce different hash values. AFAIK the Sun JVM's hash function calculates the same hash on every run, but that's not guaranteed.

Note that this is not theoretical. The hash function for java.lang.String was changed in JDK1.2 (the old hash had problems with hierarchical strings like URLs or file names, as it tended to produce the same hash for strings which only differed at the end).

java.lang.String is a special case, as the algorithm of its hashCode() is (now) documented, so you can probably rely on that. I'd still consider it bad practice. If you need a hash algorithm with special, documented properties, just write one :-).

永言不败 2024-07-25 02:51:44

另一个需要担心的(!)问题是 Java 的早期/晚期版本之间的实现可能会发生变化。 我不相信实现细节是一成不变的,因此升级到未来 Java 版本可能会导致问题。

最重要的是,我不会依赖 hashCode() 的实现。

也许您可以突出显示您实际上试图通过使用此机制解决什么问题,这将突出显示更合适的方法。

Another (!) issue to worry about is the possible change of implementation between early/late versions of Java. I don't believe the implementation details are set in stone, and so potentially an upgrade to a future Java version could cause problems.

Bottom line is, I wouldn't rely on the implementation of hashCode().

Perhaps you can highlight what problem you're actually trying to solve by using this mechanism, and that will highlight a more suitable approach.

只是回答你的问题,不继续任何讨论。 Apache Harmony JDK 实现似乎使用了不同的算法,至少看起来完全不同:

Sun JDK

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Apache Harmony

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        }
        hashCode = hash;
    }
    return hashCode;
}

随意检查一下...

Just to answer your question and not to continue any discussions. The Apache Harmony JDK implementation seems to use a different algorithm, at least it looks totally different:

Sun JDK

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Apache Harmony

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        }
        hashCode = hash;
    }
    return hashCode;
}

Feel free to check it yourself...

水中月 2024-07-25 02:51:44

如果您担心更改和可能不兼容的虚拟机,只需将现有的哈希码实现复制到您自己的实用程序类中,然后使用它来生成您的哈希码。

If you're worried about changes and possibly incompatibly VMs, just copy the existing hashcode implementation into your own utility class, and use that to generate your hashcodes .

伊面 2024-07-25 02:51:44

哈希码将根据字符串中字符的 ASCII 值计算。

String 类中的实现如下

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

哈希码中的冲突是不可避免的。 例如,字符串“Ea”和“FB”给出与 2236 相同的哈希码

The hashcode will be calculated based on the ASCII values of the characters in the String.

This is the implementation in the String Class is as follows

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

Collisions in hashcode are unavoidable. For example, the strings "Ea" and "FB" give the same hashcode as 2236

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文