从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试过 mb_chars.normalize 和 iconv)
已经有一个非常类似的问题。 其中一种解决方案使用如下代码:
string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s
这很有效,直到您注意到它还删除了空格、点、破折号以及谁知道还有什么。
我不太确定第一个代码是如何工作的,但是可以让它去除仅重音吗? 或者至少给出一个要保留的字符列表? 我对正则表达式的了解很少,但我尝试过(无济于事):
/[^\-x00-\x7F]/n # So it would leave the dash alone
我要做这样的事情:
string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
(/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s
残暴? 是的...
我也尝试过:
iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"
请帮忙?
There is a very similar question already. One of the solutions uses code like this one:
string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s
Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else.
I'm not really sure how the first code works, but could it be made to strip only accents? Or at the very least be given a list of chars to preserve? My knowledge of regexps is small, but I tried (to no avail):
/[^\-x00-\x7F]/n # So it would leave the dash alone
I'm about to do something like this:
string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
(/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s
Atrocious? Yes...
I've also tried:
iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"
Help please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它不像 Iconv 那么简洁,但可以满足我的需求:
http://snippets.dzone .com/posts/show/2384
It's not as neat as Iconv, but does what I think you want:
http://snippets.dzone.com/posts/show/2384
我会使用
transliterate
方法。 请参阅http://api.rubyonrails.org/classes/ActiveSupport/ Inflector.html#method-i-transliterateI'd use the
transliterate
method. See http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-transliterate不应该。
您输入错误,x00 之前应该有一个反斜杠,以引用 NUL 字符。
您已将“-”放在“\”和“x”之间,这将破坏对空字符的引用,从而破坏范围。
It shouldn't.
You've mistyped, there should be a backslash before the x00, to refer to the NUL character.
You've put the ‘-’ between the ‘\’ and the ‘x’, which will break the reference to the null character, and thus break the range.