替换 Javascript 中的变音符号
如何在 javascript 中将变音符号(ă、ş、ţ 等)替换为其“正常”形式(a、s、t)?
How can I replace diacritics (ă,ş,ţ etc) with their "normal" form (a,s,t) in javascript?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
如果您想完全在客户端完成此操作,我认为您唯一的选择是使用某种查找表。 这是一个起点,由一位名叫 Olavi Ivask 的小伙子在他的 博客...
您可以看到这只是已知变音符号字符的正则表达式数组,将它们映射回“普通”字符。
If you want to do it entirely on the client side, I think your only option is with some kind of lookup table. Here's a starting point, written by a chap called Olavi Ivask on his blog...
You can see this is simply an array of regexes for known diacritic chars, mapping them back onto a "plain" character.
对保罗的脚本进行了简单的修改。 扩展字符串对象
现在您可以执行以下操作:
A simple modification to the script of Paul. Extend the String-object
Now you can do:
我已将 Apache Lucene ASCII 折叠过滤器移植到 JavaScript。 您可以将许多 Unicode 字符(包括变音符号)替换为 ASCII 基本形式。
您可以在 https://github.com/mplatt/fold-to- 上找到该端口ascii-js
集成库后,您可以像这样折叠字符串:
编辑
为了使其正常工作,我所做的就是按照他的链接中的安装说明进行操作,然后导入库,然后我就可以使用它了。
I have ported the Apache Lucene ASCII Folding Filter to JavaScript. You can replace a lot of Unicode characters (including diacritics) to ASCII base forms.
You can find the port on https://github.com/mplatt/fold-to-ascii-js
After integrating the library you could fold strings like that:
Edit
What I did to get this to work was follow the install instructions in his link then import the lib, and then I could use it.
考虑以下语法,其中值中的每个符号都将替换为其键符号(区分大小写)
Consider following syntax, where each symbol from value will be replaced with it's key symbol (case-sensitive)
您需要一个转换映射,如下所示:
或者,如果您可以访问您的盒子上的
iconv
,您也许可以使用一些 ajax 调用来删除 iconv 的 //TRANSLIT 参数的重音符号。You would need a conversion map, something like this:
Or if you have access to
iconv
on your box, you could perhaps use some ajax calls to remove the accents with iconv's //TRANSLIT parameter.该方法是对第一响应的重构。
This method is a refactor of the first response.
这是可接受的低级解决方案的 ES2020 版本。 如果您能回忆起需要替换的字母,请创建一个映射并使用
String.replace()
来一一映射字母:Here is ES2020 version of the accepted low-level solution. If You can recall the letters that need to be replaced, create a mapping and use
String.replace()
to map letters one-by-one:尝试这个。
Try this.
在现代浏览器和node.js中,您可以使用 unicode规范化来分解这些字符,然后是过滤正则表达式。
str.normalize('NFKD').replace(/[^\w]/g, '')
如果您想允许使用空格、破折号等字符,您应该将正则表达式扩展为允许他们。
str.normalize('NFKD').replace(/[^\w\s.-_\/]/g, '')
注意:此方法不适用于不具有 unicode 组成的变体的字符。 即
ø
和ł
In modern browsers and node.js you can use unicode normalization to decompose those characters followed by a filtering regex.
str.normalize('NFKD').replace(/[^\w]/g, '')
If you wanted to allow characters such as whitespaces, dashes, etc. you should extend the regex to allow them.
str.normalize('NFKD').replace(/[^\w\s.-_\/]/g, '')
NOTES: This method does not work with characters that do not have unicode composed varian. i.e.
ø
andł
更完整的版本,支持区分大小写、连字等。
原始来源:http://lehelk.com/2011/05/06/script-to -删除变音符号/
A more complete version with case sensitive support, ligatures and whatnot.
Original source at: http://lehelk.com/2011/05/06/script-to-remove-diacritics/