从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试过 mb_chars.normalize 和 iconv)

发布于 2024-07-12 19:40:47 字数 817 浏览 9 评论 0原文

已经有一个非常类似的问题。 其中一种解决方案使用如下代码:

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

这很有效,直到您注意到它还删除了空格、点、破折号以及谁知道还有什么。

我不太确定第一个代码是如何工作的,但是可以让它去除重音吗? 或者至少给出一个要保留的字符列表? 我对正则表达式的了解很少,但我尝试过(无济于事):

/[^\-x00-\x7F]/n # So it would leave the dash alone

我要做这样的事情:

string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
  (/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s

残暴? 是的...

我也尝试过:

iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"

请帮忙?

There is a very similar question already. One of the solutions uses code like this one:

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else.

I'm not really sure how the first code works, but could it be made to strip only accents? Or at the very least be given a list of chars to preserve? My knowledge of regexps is small, but I tried (to no avail):

/[^\-x00-\x7F]/n # So it would leave the dash alone

I'm about to do something like this:

string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
  (/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s

Atrocious? Yes...

I've also tried:

iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"

Help please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

放肆 2024-07-19 19:41:00

它不像 Iconv 那么简洁,但可以满足我的需求:

http://snippets.dzone .com/posts/show/2384

It's not as neat as Iconv, but does what I think you want:

http://snippets.dzone.com/posts/show/2384

十雾 2024-07-19 19:40:56

它还可以删除空格、点、破折号以及谁知道还有什么。

不应该。

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

您输入错误,x00 之前应该有一个反斜杠,以引用 NUL 字符。

/[^\-x00-\x7F]/n # So it would leave the dash alone

您已将“-”放在“\”和“x”之间,这将破坏对空字符的引用,从而破坏范围。

it also removes spaces, dots, dashes, and who knows what else.

It shouldn't.

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

You've mistyped, there should be a backslash before the x00, to refer to the NUL character.

/[^\-x00-\x7F]/n # So it would leave the dash alone

You've put the ‘-’ between the ‘\’ and the ‘x’, which will break the reference to the null character, and thus break the range.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文