Java字符串搜索忽略重音符号

发布于 2024-08-24 04:29:06 字数 343 浏览 6 评论 0原文

我正在尝试为我的应用程序编写一个过滤函数,该函数将采用输入字符串并过滤掉以某种方式与给定输入不匹配的所有对象。最简单的方法是使用 String 的 contains 方法,即仅检查对象(对象中的 String 变量)是否包含过滤器中指定的字符串,但这不会考虑重音。

有问题的对象基本上是人,而我试图匹配的字符串是名称。例如,如果有人搜索 Joao,我希望 Joáo 包含在结果集中。我已经在我的应用程序中使用了 Collat​​or 类来按名称排序,它运行良好,因为它可以进行比较,即使用 UK Locale á 在 b 之前但 a 之后。但显然,如果比较 a 和 á ,它不会返回 0,因为它们不相等。

那么有人知道我该如何做到这一点吗?

I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.

The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.

So does anyone have any idea how I might be able to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初见 2024-08-31 04:29:06

使用 java.text.Normalizer和一些正则表达式来消除变音符号

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

您可以按如下方式使用:

String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao

Make use of java.text.Normalizer and a shot of regex to get rid of the diacritics.

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

Which you can use as follows:

String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao
≈。彩虹 2024-08-31 04:29:06

对于 a 和 á,Collat​​or 确实返回 0,如果您将其配置为忽略变音符号:

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    // Collator.PRIMARY also works, but is case senstive
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") 现在生成 true

Collator does return 0 for a and á, if you configure it to ignore diacritics:

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    // Collator.PRIMARY also works, but is case senstive
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") yields true now

回忆那么伤 2024-08-31 04:29:06

我编写了一个类,通过忽略变音符号(不删除它们)来搜索阿拉伯语文本。也许你可以得到这个想法或以某种方式使用它。

DiacriticInsensitiveSearch.java

I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.

DiacriticInsensitiveSearch.java

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文