String.Equalsignorecase -Uppercase v。小写
我正在浏览OpenJDK,并注意到String.equalsignorecase中的一个奇怪的代码路径,特别是方法 regionMatches :
if (ignoreCase) {
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
我理解有关调整特定字母以检查较低情况平等的评论案例检查?为什么不只是所有较低的案例呢?
I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:
if (ignoreCase) {
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
I understand the comment about adjusting for a specific alphabet to check the lower case equality, but was wondering why even have the upper case check? Why not just do all lower case?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
现在,问题重新开放了,我在这里转移答案。
“为什么它们不仅比上下案例相比,如果它比大写的情况更匹配? em>不同的对。
仅比较大写是不够的,例如,ASCII字母“ I”和Dot“İ”(
(char)304
,在土耳其字母中使用的)具有不同的大写(它们已经是大写) ,但是他们有相同的小写字母“ i”。 (请注意,土耳其语将我视为dot,而我没有dot是不同的字母,而不仅仅是一个重音字母,类似于德语的字母与umlautsä/Ö/üvs. a/o/u。)仅比较小写是不够的,例如ASCII字母“ I”和小的无点i”(
(char)305
)。他们有不同的小写(它们已经是小写),但是它们具有相同的大写字母“ i”。最后,将资本I与“小点”与“小点” i“”进行比较。他们的大写(“ i” vs.“ i”)都不是他们的下盘(“ i” vs.“ı”)匹配,但是它们的大写速度是相同的(“ i”)。如果这种现象在希腊字母“θ”和“ ϑ”(char 1012和977)中,我发现了另一个情况。
因此,真正的情况不敏感的比较甚至无法检查原始字符的大写和下盘,但必须检查大写的下盘。
Now that the question is re-opened, I transfer my answer here.
The short answer to "Why do they not just compare only lowercase instead of both upper and lower case, if it matches more cases than uppercase?": It does not match more character pairs, it merely matches different pairs.
Comparing only uppercase is not enough, e.g. the ASCII letter "I" and the capital I with dot "İ" (
(char)304
, used in Turkish alphabet) have different uppercase (they are already uppercase), but they have the same lowercase letter "i". (Note that the Turkish language considers i with dot and i without dot as different letters, not just an accented letter, similar to German with its Umlauts ä/ö/ü vs. a/o/u.)Comparing only lowercase is not enough, e.g. the ASCII letter "i" and the small dotless i "ı" (
(char)305
). They have different lowercase (they are already lowercase), but they have the same uppercase letter "I".And finally, compare capital I with dot "İ" with small dotless i "ı". Neither their uppercases ("İ" vs. "I") nor their lowercases ("i" vs. "ı") match, but the lowercase of their uppercase is the same ("I"). I found another case if this phenomenon, in the greek letters "ϴ" and "ϑ" (char 1012 and 977).
So a true case insensitive comparison can not even check uppercases and lowercases of the original characters, but must check the lowercases of the uppercases.