使用 JavaScript 执行带/不带重音字符的文本匹配

发布于 2024-11-01 20:20:14 字数 1285 浏览 4 评论 0原文

我正在使用基于 AJAX 的查找来查找用户在文本框中搜索的名称。

我假设数据库中的所有名称都将被音译为欧洲字母(即没有西里尔字母、日语、中文)。但是,名称仍将包含重音字符,例如 ç、ê 甚至 č 和 ć。

不过,像“Micic”这样的简单搜索不会匹配“Mičić”,而用户的期望是它会匹配。

AJAX 查找使用正则表达式来确定匹配项。我已经使用此函数修改了正则表达式比较,以尝试匹配更多重音字符。然而,它有点笨拙,因为它没有考虑到所有角色。

function makeComp (input)
{
    input = input.toLowerCase ();
    var output = '';
    for (var i = 0; i < input.length; i ++)
    {
        if (input.charAt (i) == 'a')
            output = output + '[aàáâãäåæ]'
        else if (input.charAt (i) == 'c')
            output = output + '[cç]';
        else if (input.charAt (i) == 'e')
            output = output + '[eèéêëæ]';
        else if (input.charAt (i) == 'i')
            output = output + '[iìíîï]';
        else if (input.charAt (i) == 'n')
            output = output + '[nñ]';
        else if (input.charAt (i) == 'o')
            output = output + '[oòóôõöø]';
        else if (input.charAt (i) == 's')
            output = output + '[sß]';
        else if (input.charAt (i) == 'u')
            output = output + '[uùúûü]';
        else if (input.charAt (i) == 'y')
            output = output + '[yÿ]'
        else
            output = output + input.charAt (i);
    }
    return output;
}

除了这样的替换函数之外,还有更好的方法吗?也许是为了“不重音”正在比较的字符串?

I am using an AJAX-based lookup for names that a user searches in a text box.

I am making the assumption that all names in the database will be transliterated to European alphabets (i.e. no Cyrillic, Japanese, Chinese). However, the names will still contain accented characters, such as ç, ê and even č and ć.

A simple search like "Micic" will not match "Mičić" though - and the user expectation is that it will.

The AJAX lookup uses regular expressions to determine a match. I have modified the regular expression comparison using this function in an attempt to match more accented characters. However, it's a little clumsy since it doesn't take into account all characters.

function makeComp (input)
{
    input = input.toLowerCase ();
    var output = '';
    for (var i = 0; i < input.length; i ++)
    {
        if (input.charAt (i) == 'a')
            output = output + '[aàáâãäåæ]'
        else if (input.charAt (i) == 'c')
            output = output + '[cç]';
        else if (input.charAt (i) == 'e')
            output = output + '[eèéêëæ]';
        else if (input.charAt (i) == 'i')
            output = output + '[iìíîï]';
        else if (input.charAt (i) == 'n')
            output = output + '[nñ]';
        else if (input.charAt (i) == 'o')
            output = output + '[oòóôõöø]';
        else if (input.charAt (i) == 's')
            output = output + '[sß]';
        else if (input.charAt (i) == 'u')
            output = output + '[uùúûü]';
        else if (input.charAt (i) == 'y')
            output = output + '[yÿ]'
        else
            output = output + input.charAt (i);
    }
    return output;
}

Apart from a substitution function like this, is there a better way? Perhaps to "deaccent" the string being compared?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

寻找我们的幸福 2024-11-08 20:20:14

有一种方法可以“对正在比较的字符串进行“去重音””,而无需使用列出您要删除的所有重音符号的替换函数……

这是 最简单的解决方案 我可以考虑从字符串中删除重音符号(和其他变音符号)。

看看它的实际效果:

var string = 'Ça été Mičić. ÀÉÏÓÛ';
console.log(string);

var string_norm = string.normalize('NFD').replace(/\p{Diacritic}/gu, ''); // Old method: .replace(/[\u0300-\u036f]/g, "");
console.log(string_norm);

  • .normalize(…) 分解字母和变音符号。
  • .replace(…) 删除所有变音符号。

There is a way to “"deaccent" the string being compared” without the use of a substitution function that lists all the accents you want to remove…

Here is the easiest solution I can think about to remove accents (and other diacritics) from a string.

See it in action:

var string = 'Ça été Mičić. ÀÉÏÓÛ';
console.log(string);

var string_norm = string.normalize('NFD').replace(/\p{Diacritic}/gu, ''); // Old method: .replace(/[\u0300-\u036f]/g, "");
console.log(string_norm);

  • .normalize(…) decomposes the letters and diacritics.
  • .replace(…) removes all the diacritics.
清眉祭 2024-11-08 20:20:14

来到这个旧线程并认为我会尝试做一个快速功能。当它们在调用的函数replace()中匹配时,我依赖于管道分隔的 OR 设置变量的顺序。我的目标是尽可能多地使用标准正则表达式实现javascript的replace()函数,以便繁重的处理可以在低级浏览器优化空间中进行,而不是在昂贵的javascript逐个字符比较中进行。

这根本不科学,但是当我将此线程中的其他函数插入到我的自动完成中时,我的旧华为 IDEOS Android 手机运行缓慢,而此函数却在快速运行:

function accentFold(inStr) {
  return inStr.replace(
    /([àáâãäå])|([çčć])|([èéêë])|([ìíîï])|([ñ])|([òóôõöø])|([ß])|([ùúûü])|([ÿ])|([æ])/g, 
    function (str, a, c, e, i, n, o, s, u, y, ae) {
      if (a) return 'a';
      if (c) return 'c';
      if (e) return 'e';
      if (i) return 'i';
      if (n) return 'n';
      if (o) return 'o';
      if (s) return 's';
      if (u) return 'u';
      if (y) return 'y';
      if (ae) return 'ae';
    }
  );
}

如果您是 jQuery 开发人员,这里有一个使用此函数的方便示例功能;您可以像在选择器中使用 :contains 一样使用 :icontains :

jQuery.expr[':'].icontains = function (obj, index, meta, stack) {
  return accentFold(
    (obj.textContent || obj.innerText || jQuery(obj).text() || '').toLowerCase()
  )
    .indexOf(accentFold(meta[3].toLowerCase())
  ) >= 0;
};

Came upon this old thread and thought I'd try my hand at doing a fast function. I'm relying on the ordering of pipe-separated ORs setting variables when they match in the function replace() is calling. My goal was to use the standard regex-implementation javascript's replace() function uses as much as possible, so that the heavy-processing can take place in low-level browser-optimized space, instead of in expensive javascript char-by-char comparisons.

It's not scientific at all, but my old Huawei IDEOS android phone is sluggish when I plug the other functions in this thread in to my autocomplete, while this function zips along:

function accentFold(inStr) {
  return inStr.replace(
    /([àáâãäå])|([çčć])|([èéêë])|([ìíîï])|([ñ])|([òóôõöø])|([ß])|([ùúûü])|([ÿ])|([æ])/g, 
    function (str, a, c, e, i, n, o, s, u, y, ae) {
      if (a) return 'a';
      if (c) return 'c';
      if (e) return 'e';
      if (i) return 'i';
      if (n) return 'n';
      if (o) return 'o';
      if (s) return 's';
      if (u) return 'u';
      if (y) return 'y';
      if (ae) return 'ae';
    }
  );
}

If you're a jQuery dev, here's a handy example of using this function; you could use :icontains the same way you'd use :contains in a selector:

jQuery.expr[':'].icontains = function (obj, index, meta, stack) {
  return accentFold(
    (obj.textContent || obj.innerText || jQuery(obj).text() || '').toLowerCase()
  )
    .indexOf(accentFold(meta[3].toLowerCase())
  ) >= 0;
};
一萌ing 2024-11-08 20:20:14

我搜索并投票了 herostwist 答案,但继续搜索,确实,这是一个现代解决方案,以 JavaScript 为核心(string.localeCompare 功能)

var a = 'réservé'; // with accents, lowercase
var b = 'RESERVE'; // no accents, uppercase

console.log(a.localeCompare(b));
// expected output: 1
console.log(a.localeCompare(b, 'en', {sensitivity: 'base'}));
// expected output: 0

但是请注意,某些移动浏览器仍然缺少完全支持!!!

在此之前,请继续关注所有平台和环境的全面支持。

就这些了吗?

不,我们现在可以更进一步,使用 string .toLocaleLowerCase 函数。

var dotted = 'İstanbul';

console.log('EN-US: ' + dotted.toLocaleLowerCase('en-US'));
// expected output: "istanbul"

console.log('TR: ' + dotted.toLocaleLowerCase('tr'));
// expected output: "istanbul"

谢谢 !

I searched and upvoted herostwist answer but kept searching and truly, here is a modern solution, core to JavaScript (string.localeCompare function)

var a = 'réservé'; // with accents, lowercase
var b = 'RESERVE'; // no accents, uppercase

console.log(a.localeCompare(b));
// expected output: 1
console.log(a.localeCompare(b, 'en', {sensitivity: 'base'}));
// expected output: 0

NOTE, however, that full support is still missing for some mobile browser !!!

Until then, keep watching out for full support across ALL platforms and env.

Is that all ?

No, we can go further right now and use string.toLocaleLowerCase function.

var dotted = 'İstanbul';

console.log('EN-US: ' + dotted.toLocaleLowerCase('en-US'));
// expected output: "istanbul"

console.log('TR: ' + dotted.toLocaleLowerCase('tr'));
// expected output: "istanbul"

Thank You !

信愁 2024-11-08 20:20:14

我能想到没有更简单的“口音”方法,但是您的替换可以更简化一些:

var makeComp = (function(){

    var accents = {
            a: 'àáâãäåæ',
            c: 'ç',
            e: 'èéêëæ',
            i: 'ìíîï',
            n: 'ñ',
            o: 'òóôõöø',
            s: 'ß',
            u: 'ùúûü',
            y: 'ÿ'
        },
        chars = /[aceinosuy]/g;

    return function makeComp(input) {
        return input.replace(chars, function(c){
            return '[' + c + accents[c] + ']';
        });
    };

}());

There is no easier way to "deaccent" that I can think of, but your substitution could be streamlined a little more:

var makeComp = (function(){

    var accents = {
            a: 'àáâãäåæ',
            c: 'ç',
            e: 'èéêëæ',
            i: 'ìíîï',
            n: 'ñ',
            o: 'òóôõöø',
            s: 'ß',
            u: 'ùúûü',
            y: 'ÿ'
        },
        chars = /[aceinosuy]/g;

    return function makeComp(input) {
        return input.replace(chars, function(c){
            return '[' + c + accents[c] + ']';
        });
    };

}());
漆黑的白昼 2024-11-08 20:20:14

我认为这是最巧妙的解决方案,

var nIC = new Intl.Collator(undefined , {sensitivity: 'base'})
var cmp = nIC.compare.bind(nIC)

如果两个字符串相同,忽略重音符号,它将返回 0。

或者你尝试localecompare

'être'.localeCompare('etre',undefined,{sensitivity: 'base'})

I think this is the neatest solution

var nIC = new Intl.Collator(undefined , {sensitivity: 'base'})
var cmp = nIC.compare.bind(nIC)

It will return 0 if the two strings are the same, ignoring accents.

Alternatively you try localecompare

'être'.localeCompare('etre',undefined,{sensitivity: 'base'})
任谁 2024-11-08 20:20:14

我做了一个原型版本:

String.prototype.strip = function() {
  var translate_re = /[öäüÖÄÜß ]/g;
  var translate = {
    "ä":"a", "ö":"o", "ü":"u",
    "Ä":"A", "Ö":"O", "Ü":"U",
    " ":"_", "ß":"ss"   // probably more to come
  };
    return (this.replace(translate_re, function(match){
        return translate[match];})
    );
};

使用如下:

var teststring = 'ä ö ü Ä Ö Ü ß';
teststring.strip();

这会将字符串更改为a_o_u_A_O_U_ss

I made a Prototype Version of this:

String.prototype.strip = function() {
  var translate_re = /[öäüÖÄÜß ]/g;
  var translate = {
    "ä":"a", "ö":"o", "ü":"u",
    "Ä":"A", "Ö":"O", "Ü":"U",
    " ":"_", "ß":"ss"   // probably more to come
  };
    return (this.replace(translate_re, function(match){
        return translate[match];})
    );
};

Use like:

var teststring = 'ä ö ü Ä Ö Ü ß';
teststring.strip();

This will will change the String to a_o_u_A_O_U_ss

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文