String.Equalsignorecase -Uppercase v。小写

发布于 2025-02-11 07:13:09 字数 985 浏览 1 评论 0原文

我正在浏览OpenJDK，并注意到String.equalsignorecase中的一个奇怪的代码路径，特别是方法 regionMatches ：

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

我理解有关调整特定字母以检查较低情况平等的评论案例检查？为什么不只是所有较低的案例呢？

原文

I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

I understand the comment about adjusting for a specific alphabet to check the lower case equality, but was wondering why even have the upper case check? Why not just do all lower case?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

有深☉意 2025-02-18 07:13:09

现在，问题重新开放了，我在这里转移答案。

“为什么它们不仅比上下案例相比，如果它比大写的情况更匹配？ em>不同的对。

仅比较大写是不够的，例如，ASCII字母“ I”和Dot“İ”（（char）304，在土耳其字母中使用的）具有不同的大写（它们已经是大写），但是他们有相同的小写字母“ i”。（请注意，土耳其语将我视为dot，而我没有dot是不同的字母，而不仅仅是一个重音字母，类似于德语的字母与umlautsä/Ö/üvs. a/o/u。）

仅比较小写是不够的，例如ASCII字母“ I”和小的无点i”（（char）305）。他们有不同的小写（它们已经是小写），但是它们具有相同的大写字母“ i”。

最后，将资本I与“小点”与“小点” i“”进行比较。他们的大写（“ i” vs.“ i”）都不是他们的下盘（“ i” vs.“ı”）匹配，但是它们的大写速度是相同的（“ i”）。如果这种现象在希腊字母“θ”和“ ϑ”（char 1012和977）中，我发现了另一个情况。

因此，真正的情况不敏感的比较甚至无法检查原始字符的大写和下盘，但必须检查大写的下盘。