爪哇。比较字符串时忽略重音符号

发布于 2024-08-24 11:15:40 字数 150 浏览 3 评论 0原文

问题很简单。 JAVA中是否有任何函数可以比较两个字符串并返回 true 忽略重音字符?

String x = "Joao";
String y = "João";

返回相等。

谢谢

The problem it's easy. Is there any function in JAVA to compare two Strings and return true ignoring the accented chars?

ie

String x = "Joao";
String y = "João";

return that are equal.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

榆西 2024-08-31 11:15:40

我认为您应该使用 Collat​​or 类。它允许您设置强度和区域设置,并且它将适当地比较字符。

来自 Java 1.6 API:

您可以设置Collat​​or的强度
属性来决定等级
差异被认为是显着的
比较。四大优势是
提供:小学、中学、
第三,相同。确切的
语言优势分配
功能取决于区域设置。为了
例如,在捷克语中,“e”和“f”是
考虑了主要差异,同时
“e”和“ě”是次要差异,
“e”和“E”是三级差异
并且“e”和“e”是相同的。

我认为这里重要的一点(人们试图提出的)是“Joao”和“João”永远不应该被视为相等,但是如果您正在进行排序,您不希望根据它们的 ASCII 值对它们进行比较因为那样你就会有像若昂、约翰、若昂这样的人,这不好。使用 collat​​or 类肯定可以正确处理这个问题。

I think you should be using the Collator class. It allows you to set a strength and locale and it will compare characters appropriately.

From the Java 1.6 API:

You can set a Collator's strength
property to determine the level of
difference considered significant in
comparisons. Four strengths are
provided: PRIMARY, SECONDARY,
TERTIARY, and IDENTICAL. The exact
assignment of strengths to language
features is locale dependant. For
example, in Czech, "e" and "f" are
considered primary differences, while
"e" and "ě" are secondary differences,
"e" and "E" are tertiary differences
and "e" and "e" are identical.

I think the important point here (which people are trying to make) is that "Joao"and "João" should never be considered as equal, but if you are doing sorting you don't want them to be compared based on their ASCII value because then you would have something like Joao, John, João, which is not good. Using the collator class definitely handles this correctly.

抱猫软卧 2024-08-31 11:15:40

您没有从我这里听到这一点(因为我不同意问题的前提),但是,您可以使用 java.text.Normalizer,并使用 NFD 进行规范化:这将重音从它所附加的字母中分离出来。然后您可以过滤掉重音字符并进行比较。

You didn't hear this from me (because I disagree with the premise of the question), but, you can use java.text.Normalizer, and normalize with NFD: this splits off the accent from the letter it's attached to. You can then filter off the accent characters and compare.

伏妖词 2024-08-31 11:15:40

或者使用 apache 的 stripAccents StringUtils 库如果你想比较/排序忽略重音:

 public int compareStripAccent(String a, String b) {
    return StringUtils.stripAccents(a).compareTo(StringUtils.stripAccents(b));
}

Or use stripAccents from apache StringUtils library if you want to compare/sort ignoring accents :

 public int compareStripAccent(String a, String b) {
    return StringUtils.stripAccents(a).compareTo(StringUtils.stripAccents(b));
}
南城旧梦 2024-08-31 11:15:40

Java 的 Collat​​or 对于“a " 和 "á",如果将其配置为忽略变音符号:

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") 产生 true

Java's Collator returns 0 for both "a" and "á", if you configure it to ignore diacritics:

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") yields true

臻嫒无言 2024-08-31 11:15:40
public boolean insenstiveStringComparator (String a, String b) {
    java.text.Collator collate = java.text.Collator.getInstance();
    collate.setStrength(java.text.Collator.PRIMARY);
    collate.setDecomposition(java.text.Collator.CANONICAL_DECOMPOSITION); 
    return collate.equals(a, b);    
}
public boolean insenstiveStringComparator (String a, String b) {
    java.text.Collator collate = java.text.Collator.getInstance();
    collate.setStrength(java.text.Collator.PRIMARY);
    collate.setDecomposition(java.text.Collator.CANONICAL_DECOMPOSITION); 
    return collate.equals(a, b);    
}
殊姿 2024-08-31 11:15:40

此类转换的问题在于,从重音字符到非重音字符并不总是存在明确的映射。它取决于代码页、本地化等。例如,带有重音符号的 a 是否相当于“a”?对于人类来说不是问题,但对于计算机来说就更棘手了。

AFAIK Java 没有内置的转换功能可以查找当前的本地化选项并进行此类转换。您可能需要一些可以更好地处理 unicode 的外部库,例如 ICU (http://site.icu-project.org /)

The problem with these sort of conversions is that there isn't always a clear-cut mapping from accented to non-accented characters. It depends on codepages, localizations, etc. For example, is this a with an accent equivalent to an "a"? Not a problem for a human, but trickier for the computer.

AFAIK Java does not have a built in conversion that can look up the current localization options and make these sort of conversions. You may need some external library that handles unicode better, like ICU (http://site.icu-project.org/ )

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文