String.Equalsignorecase -Uppercase v。小写

发布于 2025-02-11 07:13:09 字数 985 浏览 1 评论 0原文

我正在浏览OpenJDK,并注意到String.equalsignorecase中的一个奇怪的代码路径,特别是方法 regionMatches

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

我理解有关调整特定字母以检查较低情况平等的评论案例检查?为什么不只是所有较低的案例呢?

I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

I understand the comment about adjusting for a specific alphabet to check the lower case equality, but was wondering why even have the upper case check? Why not just do all lower case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

有深☉意 2025-02-18 07:13:09

现在,问题重新开放了,我在这里转移答案。

“为什么它们不仅比上下案例相比,如果它比大写的情况更匹配? em>不同的对。

仅比较大写是不够的,例如,ASCII字母“ I”和Dot“İ”((char)304,在土耳其字母中使用的)具有不同的大写(它们已经是大写) ,但是他们有相同的小写字母“ i”。 (请注意,土耳其语将我视为dot,而我没有dot是不同的字母,而不仅仅是一个重音字母,类似于德语的字母与umlautsä/Ö/üvs. a/o/u。)

仅比较小写是不够的,例如ASCII字母“ I”和小的无点i”((char)305)。他们有不同的小写(它们已经是小写),但是它们具有相同的大写字母“ i”。

最后,将资本I与“小点”与“小点” i“”进行比较。他们的大写(“ i” vs.“ i”)都不是他们的下盘(“ i” vs.“ı”)匹配,但是它们的大写速度是相同的(“ i”)。如果这种现象在希腊字母“θ”和“ ϑ”(char 1012和977)中,我发现了另一个情况。

因此,真正的情况不敏感的比较甚至无法检查原始字符的大写和下盘,但必须检查大写的下盘。

Now that the question is re-opened, I transfer my answer here.

The short answer to "Why do they not just compare only lowercase instead of both upper and lower case, if it matches more cases than uppercase?": It does not match more character pairs, it merely matches different pairs.

Comparing only uppercase is not enough, e.g. the ASCII letter "I" and the capital I with dot "İ" ((char)304, used in Turkish alphabet) have different uppercase (they are already uppercase), but they have the same lowercase letter "i". (Note that the Turkish language considers i with dot and i without dot as different letters, not just an accented letter, similar to German with its Umlauts ä/ö/ü vs. a/o/u.)

Comparing only lowercase is not enough, e.g. the ASCII letter "i" and the small dotless i "ı" ((char)305). They have different lowercase (they are already lowercase), but they have the same uppercase letter "I".

And finally, compare capital I with dot "İ" with small dotless i "ı". Neither their uppercases ("İ" vs. "I") nor their lowercases ("i" vs. "ı") match, but the lowercase of their uppercase is the same ("I"). I found another case if this phenomenon, in the greek letters "ϴ" and "ϑ" (char 1012 and 977).

So a true case insensitive comparison can not even check uppercases and lowercases of the original characters, but must check the lowercases of the uppercases.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文