在 JavaScript 中删除字符串中的重音符号/变音符号
如何从字符串中删除重音字符? 特别是在 IE6 中,我遇到了这样的情况:
accentsTidy = function(s){
var r=s.toLowerCase();
r = r.replace(new RegExp(/\s/g),"");
r = r.replace(new RegExp(/[àáâãäå]/g),"a");
r = r.replace(new RegExp(/æ/g),"ae");
r = r.replace(new RegExp(/ç/g),"c");
r = r.replace(new RegExp(/[èéêë]/g),"e");
r = r.replace(new RegExp(/[ìíîï]/g),"i");
r = r.replace(new RegExp(/ñ/g),"n");
r = r.replace(new RegExp(/[òóôõö]/g),"o");
r = r.replace(new RegExp(/œ/g),"oe");
r = r.replace(new RegExp(/[ùúûü]/g),"u");
r = r.replace(new RegExp(/[ýÿ]/g),"y");
r = r.replace(new RegExp(/\W/g),"");
return r;
};
但是 IE6 让我烦恼,似乎它不喜欢我的正则表达式。
How do I remove accentuated characters from a string?
Especially in IE6, I had something like this:
accentsTidy = function(s){
var r=s.toLowerCase();
r = r.replace(new RegExp(/\s/g),"");
r = r.replace(new RegExp(/[àáâãäå]/g),"a");
r = r.replace(new RegExp(/æ/g),"ae");
r = r.replace(new RegExp(/ç/g),"c");
r = r.replace(new RegExp(/[èéêë]/g),"e");
r = r.replace(new RegExp(/[ìíîï]/g),"i");
r = r.replace(new RegExp(/ñ/g),"n");
r = r.replace(new RegExp(/[òóôõö]/g),"o");
r = r.replace(new RegExp(/œ/g),"oe");
r = r.replace(new RegExp(/[ùúûü]/g),"u");
r = r.replace(new RegExp(/[ýÿ]/g),"y");
r = r.replace(new RegExp(/\W/g),"");
return r;
};
but IE6 bugs me, seems it doesn't like my regular expression.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
这对我来说是这样的。 JavaScript、Google Apps 脚本、GAS
This did it for me. JavaScript, Google Apps Scripts, GAS
如果您也有兴趣将希腊字符转换为拉丁字符,我已经包含了希腊字母 alhpabet 以及一些二合字母; 它也可以用于组合正字和语音的形式希腊语。
此外,此版本接受以下可选选项:
override
Object {replace:string -> 搜索:正则表达式,...}它会覆盖默认的替换
ignore
Array忽略可以是字符串或正则表达式的列表项
If you're also interesting in converting Greek characters to Latin, I've included the Greek alhpabet along with some digraphs; it can also be used for a form of combined orthographic and phonetic Greeklish.
Also, this version accepts optional options with:
override
Object {replace:string -> search:regex, ...}It overrides the default replacements
ignore
Array<string | regex> | stringIgnores the list items which can be strings or regular expressions
从毛利语中删除宏
对于任何来这里从毛利语中删除长宏的人。
注意:当我格式化代码时,Stackblitz 删除了映射键上的单引号 - 他们必须确信它没问题。 此代码的子集,谢谢
Removing macrons from te reo Māori language
For anyone who came here to remove macrons from te reo Māori language.
Note: Stackblitz removed the single quotes off the map keys when I formatted the code - they must be confident it is OK. Subset of this code thank you
使用 localCompare https://developer.mozilla。 org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare
use localCompare https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare
您可以通过多种方式创建正则表达式。 使用新的 RegExp 构造函数:
或者使用正则表达式文字表示法:
您混合了两者。
You can create regex's in multiple ways. Using the new
RegExp
-constructor:Or using the regex literal notation:
You have mixed the two.
将用户定义的函数传递给
Array.sort()
方法,并在此用户定义的函数中使用
String.localeCompare()
Pass a user defined function to the
Array.sort()
method, and in this user defined function useString.localeCompare()
我使用了 string.js 的 latinise() 方法,它可以让你这样做这:
I used string.js's latinise() method, which lets you do it like this:
我的看法是删除所有重音符号和特殊字符。 我将 Lewis Diamond 优雅的重音去除方法与社区 wiki 的快速字符替换方法混合在一起,以替换其余的字符。 结果是一个 Ascii 字符串。
My take, for removing all accents and special characters. I've mixed Lewis Diamond's elegant accent removal method with the community wiki's fast character replace for the rest of the characters. The result is an Ascii string.
通过给定的测试,一种解决方案似乎更快:http://jsperf.com/diacritics/9
工作示例: http://jsbin.com/sovorute/1/edit
推理:速度更快的原因之一是我们只迭代由否定正则表达式模式选取的特殊字符。 最快的测试(不带 in 的字符串迭代)对给定文本迭代 1001,这意味着每个字符。 此迭代仅迭代 35 次并输出相同的结果。 请记住,这只会替换地图中指示的内容。
关于该主题的经典文章:http://alistapart.com/article/accent-folding -for-auto-complete
来源:http://semplicewebsites。 com/removing-acents-javascript ,还提供了一个很好的字符映射。
One solution that seems to be way faster by the given test : http://jsperf.com/diacritics/9
Working example: http://jsbin.com/sovorute/1/edit
Reasoning: One reason this is much faster is because we only iterate through the special characters, picked by the negated regex pattern. The fastest of the tests (String Iteration without in) iterates 1001 on the given text, which means every character. This one iterates only 35 times and outputs the same result. Keep in mind that this will only replace what is indicated in the map.
Classic article on the subject: http://alistapart.com/article/accent-folding-for-auto-complete
Credit: http://semplicewebsites.com/removing-accents-javascript , also provides a nice character map.
替换变音符号的更简单方法。
A simpler way to replace the diacriticals.
我使用 GitHub Jakub Dundalek 的存储库 Latinize.js。
I use GitHub Jakub Dundalek's repository Latinize.js.
我已经分叉了 billy 的代码
http://jsfiddle.net/billybraga/UHmnf/ (来自他的 发布)到此:http: //jsfiddle.net/infralabs/dJX58/
我更正了 ſ 和 ß 字符的转录,并添加了这些字符的转换:Þþ、Ðð、Ŋŋ、IJij、Œœ。
修改后的代码片段如下:
I have forked billy's code
http://jsfiddle.net/billybraga/UHmnf/ (from his post) into this: http://jsfiddle.net/infralabs/dJX58/
I corrected transcription of ſ and ß characters and also added coversion of these ones: Þþ, Ðð, Ŋŋ, IJij, Œœ.
The modified snippet is below:
测试了从字符串中删除变音符号的最快方法
几乎涵盖了所有角色。 发现这是所有其他方法中性能最好的方法
变音符号地图
使用
Tested fastest way to remove diacritics from the string that
covers almost all characters. Came to know that this is the best performing method among all others
Diacritics Map
Use
这是我对 lehelk.com 版本的修改版本还删除了重音的 html 实体:
http://jsfiddle.net/billybraga/UHmnf/
我仍然不过不知道性能如何...
Here's my modified version of lehelk.com's version that also removes html entites that are accents :
http://jsfiddle.net/billybraga/UHmnf/
I still don't know about performance, though...
这是一个非常简单的解决方案,没有太多代码,使用一个非常简单的变音符号映射,其中包括部分或全部映射到包含多个字符的 ascii 等效项,即 Æ => 。 AE, ffi => ffi 等...还包括一些非常基本的功能测试
Here's a very simple solution without too much code using a very simple map of diacritics that includes some or all that map to ascii equivalents containing more than one character, i.e. Æ => AE, ffi => ffi, etc... Also included some very basic functional tests
normalize-diacritics 非常有用
normalize-diacritics is very useful
感谢大家
我使用这个版本并说出原因(因为我一开始就错过了这些解释,所以我尝试帮助下一位读者,如果他和我一样迟钝......)
备注:我想要一个有效的解决方案,所以:
等等...
我的版本是:
(里面没有新的技术技巧,只有一些选定的+解释为什么)
我这样使用它:
评论:
thanks to all
I use this version and say why (because I misses those explanations at the begining, so I try to help the next reader if he is as dull as me ...)
Remark : I wanted an efficient solution, so :
etc ...
My version is :
(there is no new technical trick inside it, only some selected ones + explanations why)
and I use it this way :
Comments :
假设你知道你在做什么,我怀疑 IE6 没有正确解释文件的编码,因此无法识别文件中的非 ASCII 字符:
(虽然它“闻起来”不对,我会考虑进行排序,比如说在服务器上使用区域设置感知的东西......但无论如何......)
Assuming you know what you're doing, I suspect IE6 is not interpreting the file's encoding correctly, and hence not recognising the non-ASCII characters in the file:
(It "smells" wrong though, I'd look into doing the sorting, say on the server using something that's locale-aware... but anyway...)
我发现所有这些都有点笨拙,而且我对正则表达式不太专业,所以这里有一个更简单的版本。 假设该字符串已经采用 Unicode 格式,则可以很容易地将其翻译为您最喜欢的服务器端语言:
I found all these a little clumsy and I'm not too expert on regular expressions, so here's a simpler version. It would be quite easy to translate it to your favourite server-side language, assuming that the string already in Unicode:
更完整的版本,支持区分大小写、连字等。
原始来源:http://lehelk.com/2011/05/06/script-to-remove-diacritics/
A more complete version with case sensitive support, ligatures and whatnot.
Original source at: http://lehelk.com/2011/05/06/script-to-remove-diacritics/
您可以使用 Lodash 库中的
_.deburr()
方法。它可作为独立的 NPM 包
lodash.deburr
,或作为lodash
包的一部分。结果将是:
“Mon Cafe est plein de cafeine”
You can use the
_.deburr()
method from the Lodash library.It's available as a stand-alone NPM package
lodash.deburr
, or as part of thelodash
package.The result will be :
"Mon cafe est plein de cafeine"
这是一个基于 Unicode 标准的非常快速的脚本,摘自此处:
http://semplicewebsites.com/removing-accents-javascript
一些示例:
确保上面的 latin_map不会因复制/粘贴或其他转换而损坏,请使用此 base64 编码字符串,替换上面的第一行:
Here is a very fast script based on the Unicode standard, taken from here:
http://semplicewebsites.com/removing-accents-javascript
Some examples:
To ensure the above latin_map doesn't get corrupted by copy/pasting or other transformations, use this base64 encoded string, replacing the first line of the above:
新正则表达式的格式是
所以你会想要
The format for new RegExp is
So you would want
在 NPM 中有一个包可以解决这个问题: latinize
这是一个非常好的解决这个问题的包。
In NPM there is a package for this: latinize
It's a very good package to solve this issue.
基于 Ian Elliott 的出色解决方案缩短了代码:
编辑:更正了非工作代码
Shortened code based on the excellent solution by Ian Elliott:
Edit: Corrected non-working code
有很多,但我认为这个很简单而且足够好:
如果您还想删除特殊字符并转换下划线中的空格和连字符,请执行以下操作:
There's a lot out there, but I think this one is simple and good enough:
If you also want to remove special characters and transform spaces and hyphens in underscores, do this:
使用 ES2015/ES6 String.prototype.normalize() ,
注意:如果您想要将
\uFB01
(fi
) 标准化(到fi
),请使用NFKD
>)。这里发生了两件事:
normalize()
转换为NFD
Unicode 范式将组合字素分解为简单字素的组合。Crème
的è
最终表示为e
+̀
。从 2021 年开始,人们还可以使用Unicode 属性转义:
性能测试见评论。
或者,如果您只想排序
Intl.Collator 有足够的支持~95% 现在,一个polyfill也可以在此处获取,但我还没有测试过。
With ES2015/ES6 String.prototype.normalize(),
Note: use
NFKD
if you want things like\uFB01
(fi
) normalized (tofi
).Two things are happening here:
normalize()
ing toNFD
Unicode normal form decomposes combined graphemes into the combination of simple ones. Theè
ofCrème
ends up expressed ase
+̀
.As of 2021, one can also use Unicode property escapes:
See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
我稍微修改了 khel 版本,原因有一个:每个正则表达式解析/替换都会花费 O(n) 次操作,其中 n 是目标文本中的字符数。 但是,正则表达式并不完全是我们所需要的。 所以:
为了测试我的理论,我在 http://jsperf.com/diacritics/12 中编写了一个测试。 结果:
在 Windows 8 64 位上的 Chrome 28.0.1500.95 32 位中进行测试:
使用正则表达式
4,558 次/秒 ±4.16%。 慢 37%
字符串生成器风格
7,308 次操作/秒 ±4.88%。 最快的
更新
Windows 8 64 位上的 Chrome 33.0.1750 中
测试:
使用 Regexp
5,260 ±1.25% 操作/秒 慢 76%
使用@skerit版本
最快 22,138 ±2.12% 操作/秒
更新 - 19/03/2014
添加缺少的“OE”变音符号。
更新 - 27/03/2014
使用更快的方式使用 js 横贯字符串 - “什么?” 版本
更新 - 14/05/2014
社区 wiki
I slightly modified khel version for one reason: Every regexp parse/replace will cost O(n) operations, where n is number of characters in target text. But, regexp is not exactly what we need. So:
To test my theory I wrote a test in http://jsperf.com/diacritics/12. Results:
Testing in Chrome 28.0.1500.95 32-bit on Windows 8 64-bit:
Using Regexp
4,558 ops/sec ±4.16%. 37% slower
String Builder style
7,308 ops/sec ±4.88%. fastest
Update
Testing in Chrome 33.0.1750 on Windows 8 64-bit:
Using Regexp
5,260 ±1.25% ops/sec 76% slower
Using @skerit version
22,138 ±2.12% ops/sec fastest
Update - 19/03/2014
Adding missing "OE" diacritics.
Update - 27/03/2014
Using a faster way to transverse a string using js - "What?" Version
Update - 14/05/2014
Community wiki