有没有办法使用正则表达式匹配带有波浪号的字符?
看看这个:
"nAo".match(/(nao)/i) # => #<MatchData "nAo" 1:"nAo">
"nÃo".match(/(não)/i) # => nil
有办法解决这个问题吗?
编辑: 似乎 ruby 在与 i 标志进行正则表达式比较时缺乏对 unicode 字符的支持(忽略大小写)... 使用 MRI 1.8.7p249
Look at this:
"nAo".match(/(nao)/i) # => #<MatchData "nAo" 1:"nAo">
"nÃo".match(/(não)/i) # => nil
is there a way to fix that?
Edit:
It seems that ruby lacks support for unicode characters on regexp comparisons with i flag(ignore case)...
Using MRI 1.8.7p249
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不了解 Ruby,但大多数正则表达式引擎不理解非 ASCII 字符的大写/小写。您能做的最好的事情是:
理解大写/小写关系的问题在于它依赖于语言。 Unicode 只编码字符的形式,而不编码含义。因此,根据语言的不同,unicode 中的大写字符可以具有不同的小写字符。
以
SS
为例。在英语中,小写字母是ss
,但在德语中,它可以是ß
。另一个例子是字母I
,在英语中它是小写的i
,但在土耳其语中它的小写是ı
(没有点)。这是因为土耳其语中的i
具有大写的I
(带有点)。因此,大多数正则表达式实现只是放弃并拒绝理解标准 ASCII 之外的字符的大写/小写关系。
Don't know about Ruby but most regex engine don't understand uppercase/lowercase for non ASCII characters. The best you can do is:
The problem with understanding uppercase/lowercase relationship is that it is language dependent. Unicode encodes only the form of the character, not the meaning. Therefore an uppercase character in unicode can have different lowercase characters depending on the language.
Take for example
SS
. In English the lowercase would bess
but in German it can beß
. Another example is the letterI
which in English has the lowercasei
but in Turkish its lowercase isı
(without a dot). That's becausei
in Turkish has the uppercaseİ
(with a dot).Due to this, most regex implementations simply give up and refuse to understand uppercase/lowercase relationships for characters outside standard ASCII.
尝试为 Ruby 找到一些 unicode 规范化 模块。
Try to find some unicode normalization modules for Ruby.
请注意,自 1.9 以来,Ruby 具有更好的字符支持(看起来您运行的是 Ruby 1.8.7)。旧的正则表达式引擎在 Ruby 1.9 中被 Oniguruma 取代。
http://www.geocities.jp/kosako3/oniguruma/
Note that Ruby has a better character support since 1.9 (it seems like you run Ruby 1.8.7). The old regex engine was replaced with Oniguruma in Ruby 1.9.
http://www.geocities.jp/kosako3/oniguruma/