替换 Javascript 中的变音符号
如何在 javascript 中将变音符号(ă、ş、ţ 等)替换为其“正常”形式(a、s、t)?
How can I replace diacritics (ă,ş,ţ etc) with their "normal" form (a,s,t) in javascript?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
如果您想完全在客户端完成此操作,我认为您唯一的选择是使用某种查找表。 这是一个起点,由一位名叫 Olavi Ivask 的小伙子在他的 博客...
您可以看到这只是已知变音符号字符的正则表达式数组,将它们映射回“普通”字符。
If you want to do it entirely on the client side, I think your only option is with some kind of lookup table. Here's a starting point, written by a chap called Olavi Ivask on his blog...
You can see this is simply an array of regexes for known diacritic chars, mapping them back onto a "plain" character.
对保罗的脚本进行了简单的修改。 扩展字符串对象
现在您可以执行以下操作:
A simple modification to the script of Paul. Extend the String-object
Now you can do:
我已将 Apache Lucene ASCII 折叠过滤器移植到 JavaScript。 您可以将许多 Unicode 字符(包括变音符号)替换为 ASCII 基本形式。
您可以在 https://github.com/mplatt/fold-to- 上找到该端口ascii-js
集成库后,您可以像这样折叠字符串:
编辑
为了使其正常工作,我所做的就是按照他的链接中的安装说明进行操作,然后导入库,然后我就可以使用它了。
I have ported the Apache Lucene ASCII Folding Filter to JavaScript. You can replace a lot of Unicode characters (including diacritics) to ASCII base forms.
You can find the port on https://github.com/mplatt/fold-to-ascii-js
After integrating the library you could fold strings like that:
Edit
What I did to get this to work was follow the install instructions in his link then import the lib, and then I could use it.
考虑以下语法,其中值中的每个符号都将替换为其键符号(区分大小写)
Consider following syntax, where each symbol from value will be replaced with it's key symbol (case-sensitive)
您需要一个转换映射,如下所示:
或者,如果您可以访问您的盒子上的
iconv
,您也许可以使用一些 ajax 调用来删除 iconv 的 //TRANSLIT 参数的重音符号。You would need a conversion map, something like this:
Or if you have access to
iconv
on your box, you could perhaps use some ajax calls to remove the accents with iconv's //TRANSLIT parameter.该方法是对第一响应的重构。
This method is a refactor of the first response.
这是可接受的低级解决方案的 ES2020 版本。 如果您能回忆起需要替换的字母,请创建一个映射并使用
String.replace()
来一一映射字母:Here is ES2020 version of the accepted low-level solution. If You can recall the letters that need to be replaced, create a mapping and use
String.replace()
to map letters one-by-one:尝试这个。
Try this.
在现代浏览器和node.js中,您可以使用 unicode规范化来分解这些字符,然后是过滤正则表达式。
str.normalize('NFKD').replace(/[^\w]/g, '')
如果您想允许使用空格、破折号等字符,您应该将正则表达式扩展为允许他们。
str.normalize('NFKD').replace(/[^\w\s.-_\/]/g, '')
注意:此方法不适用于不具有 unicode 组成的变体的字符。 即
ø
和ł
In modern browsers and node.js you can use unicode normalization to decompose those characters followed by a filtering regex.
str.normalize('NFKD').replace(/[^\w]/g, '')
If you wanted to allow characters such as whitespaces, dashes, etc. you should extend the regex to allow them.
str.normalize('NFKD').replace(/[^\w\s.-_\/]/g, '')
NOTES: This method does not work with characters that do not have unicode composed varian. i.e.
ø
andł
更完整的版本,支持区分大小写、连字等。
原始来源:http://lehelk.com/2011/05/06/script-to -删除变音符号/
A more complete version with case sensitive support, ligatures and whatnot.
Original source at: http://lehelk.com/2011/05/06/script-to-remove-diacritics/