当前位置：文江博客话题详情

Ruby regex Unicode encoding character-properties

如何在 Ruby 1.9 中为 unicode 西里尔字符指定 Regexp

发布于 2024-08-30 14:59:08 字数 539 浏览 17 评论 0原文

#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8> 
p str2.scan /\p{Cyrillic}/ #found all cyrillic characters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2

问题是为什么 \w 忽略西里尔字符？

我已经从 http://rubyinstaller.org/ 安装了最新的 ruby 软件包。这是我的 ruby -v 输出

ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

据我所知 1.9 oniguruma 正则表达式库完全支持 unicode 字符。

#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8> 
p str2.scan /\p{Cyrillic}/ #found all cyrillic characters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2

The question is why \w ignore cyrillic characters?

I have installed latest ruby package from http://rubyinstaller.org/.
Here is my output of ruby -v

ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

As far as i know 1.9 oniguruma regular expression library has full support for unicode characters.

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

怪我太投入 2024-09-06 14:59:08

这是在 Ruby 文档中指定的： \w< /code> 相当于 [a-zA-Z0-9_]，因此不针对任何 unicode 字符。

您可能想使用 [[:alnum:]] 来代替，其中包括所有 unicode 字母和数字字符。另请检查 [[:word:]] 和 [[:alpha:]]。

回复收藏 0 原文

~没有更多了~

关于作者

一花一树开

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

5040234068

文章 0 评论 0

樱花雨梦

文章 0 评论 0

≈。彩虹

文章 0 评论 0

雨轻弹

文章 0 评论 0

血之狂魔

文章 0 评论 0

qq_0bIjwE

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文