Java字符串搜索忽略重音符号
我正在尝试为我的应用程序编写一个过滤函数,该函数将采用输入字符串并过滤掉以某种方式与给定输入不匹配的所有对象。最简单的方法是使用 String 的 contains 方法,即仅检查对象(对象中的 String 变量)是否包含过滤器中指定的字符串,但这不会考虑重音。
有问题的对象基本上是人,而我试图匹配的字符串是名称。例如,如果有人搜索 Joao,我希望 Joáo 包含在结果集中。我已经在我的应用程序中使用了 Collator 类来按名称排序,它运行良好,因为它可以进行比较,即使用 UK Locale á 在 b 之前但 a 之后。但显然,如果比较 a 和 á ,它不会返回 0,因为它们不相等。
那么有人知道我该如何做到这一点吗?
I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.
The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.
So does anyone have any idea how I might be able to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
java.text.Normalizer
和一些正则表达式来消除变音符号。
您可以按如下方式使用:
Make use of
java.text.Normalizer
and a shot of regex to get rid of the diacritics.Which you can use as follows:
对于 a 和 á,Collator 确实返回 0,如果您将其配置为忽略变音符号:
isSame("a", "á") 现在生成 true
Collator does return 0 for a and á, if you configure it to ignore diacritics:
isSame("a", "á") yields true now
我编写了一个类,通过忽略变音符号(不删除它们)来搜索阿拉伯语文本。也许你可以得到这个想法或以某种方式使用它。
DiacriticInsensitiveSearch.java
I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.
DiacriticInsensitiveSearch.java