爪哇。比较字符串时忽略重音符号
问题很简单。 JAVA中是否有任何函数可以比较两个字符串并返回 true 忽略重音字符?
即
String x = "Joao";
String y = "João";
返回相等。
谢谢
The problem it's easy. Is there any function in JAVA to compare two Strings and return true ignoring the accented chars?
ie
String x = "Joao";
String y = "João";
return that are equal.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为您应该使用 Collator 类。它允许您设置强度和区域设置,并且它将适当地比较字符。
来自 Java 1.6 API:
我认为这里重要的一点(人们试图提出的)是“Joao”和“João”永远不应该被视为相等,但是如果您正在进行排序,您不希望根据它们的 ASCII 值对它们进行比较因为那样你就会有像若昂、约翰、若昂这样的人,这不好。使用 collator 类肯定可以正确处理这个问题。
I think you should be using the Collator class. It allows you to set a strength and locale and it will compare characters appropriately.
From the Java 1.6 API:
I think the important point here (which people are trying to make) is that "Joao"and "João" should never be considered as equal, but if you are doing sorting you don't want them to be compared based on their ASCII value because then you would have something like Joao, John, João, which is not good. Using the collator class definitely handles this correctly.
您没有从我这里听到这一点(因为我不同意问题的前提),但是,您可以使用 java.text.Normalizer,并使用 NFD 进行规范化:这将重音从它所附加的字母中分离出来。然后您可以过滤掉重音字符并进行比较。
You didn't hear this from me (because I disagree with the premise of the question), but, you can use
java.text.Normalizer
, and normalize withNFD
: this splits off the accent from the letter it's attached to. You can then filter off the accent characters and compare.或者使用 apache 的 stripAccents StringUtils 库如果你想比较/排序忽略重音:
Or use stripAccents from apache StringUtils library if you want to compare/sort ignoring accents :
Java 的 Collator 对于“a " 和 "á",如果将其配置为忽略变音符号:
isSame("a", "á") 产生 true
Java's Collator returns 0 for both "a" and "á", if you configure it to ignore diacritics:
isSame("a", "á") yields true
此类转换的问题在于,从重音字符到非重音字符并不总是存在明确的映射。它取决于代码页、本地化等。例如,带有重音符号的 a 是否相当于“a”?对于人类来说不是问题,但对于计算机来说就更棘手了。
AFAIK Java 没有内置的转换功能可以查找当前的本地化选项并进行此类转换。您可能需要一些可以更好地处理 unicode 的外部库,例如 ICU (http://site.icu-project.org /)
The problem with these sort of conversions is that there isn't always a clear-cut mapping from accented to non-accented characters. It depends on codepages, localizations, etc. For example, is this a with an accent equivalent to an "a"? Not a problem for a human, but trickier for the computer.
AFAIK Java does not have a built in conversion that can look up the current localization options and make these sort of conversions. You may need some external library that handles unicode better, like ICU (http://site.icu-project.org/ )