在 zsh 正则表达式中匹配 unicode 字符
我想确保变量不包含特定字符(在本例中为“α”),但以下代码失败(返回 1):
FOO="test" && [[ $FOO =~ '^[^α]*$' ]]
编辑:根据下面的 Stema 的反馈更改了模式以要求从头到尾仅匹配“非 'α'” 字符。
将 'α' 替换为 'x' 可以按预期工作。为什么它会失败并显示“α”,我该如何使其工作?
系统信息:
$ zsh --version
zsh 4.3.11 (i386-apple-darwin11.0)
$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"
编辑 2:我现在在运行 Ubuntu 11.10 和 zsh 4.3.11 且具有相同语言环境设置的 Linux 计算机上进行了测试,并且它可以工作 – 即 FOO="test" && [[ $FOO =~ '^[^α]*$' ]]
返回成功。我运行的是 Mac OS X 10.7.2。
I want to make sure that a variable does not contain a specific character (in this case an 'α'), but the following code fails (returns 1):
FOO="test" && [[ $FOO =~ '^[^α]*
Edit: Changed the pattern based on feedback from stema below to require matching only “non-'α'” characters from start to end.
Replacing 'α' with e.g. 'x' works as expected. Why does it fail with an 'α', and how can I make this work?
System info:
$ zsh --version
zsh 4.3.11 (i386-apple-darwin11.0)
$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"
Edit 2: I now tested on a Linux machine running Ubuntu 11.10 with zsh 4.3.11 with identical locale settings, and there it works – i.e. FOO="test" && [[ $FOO =~ '^[^α]*$' ]]
returns success. I'm running Mac OS X 10.7.2.
]]
Edit: Changed the pattern based on feedback from stema below to require matching only “non-'α'” characters from start to end.
Replacing 'α' with e.g. 'x' works as expected. Why does it fail with an 'α', and how can I make this work?
System info:
Edit 2: I now tested on a Linux machine running Ubuntu 11.10 with zsh 4.3.11 with identical locale settings, and there it works – i.e. FOO="test" && [[ $FOO =~ '^[^α]*$' ]]
returns success. I'm running Mac OS X 10.7.2.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用此正则表达式
.*[^α].*
您无法测试α
不在字符串中。这是测试的内容:字符串中是否有一个不是α
的字符。如果您想检查字符串中是否存在该字符,请执行此操作,
这将检查从开始到结束的完整字符串是否由非“α”字符组成。
with this regex
.*[^α].*
you can't test thatα
is not in the string. What this is testing is: Is there ONE character in the string that is not aα
.If you want to check that there is not this character in the string, do this
this will check if the complete string from the start to the end consists of non "α" characters.
的最简单方法是在开始处锚定一个负前瞻:
这就是说“当从一开始向前看时,我不应该在任何地方看到
α
。表达这一点 使用查找头是因为它们是非捕获的,因此您可以将它们与其他捕获正则表达式结合使用,例如在输入中查找不包含
α,使用这个:
^(?!.*α)"(\d+)"
The simplest way of expressing this is with a negative look-ahead anchored at the start:
This is saying "when looking forward from the start, I shouldn't be able to see
α
anywhere.The advantage of using look-heads is they are non-capturing, so you can combine them with other capturing regexes, eg to find groups of numbers in quotes in input that doesn't contain a
α
, use this:^(?!.*α)"(\d+)"
由于某种原因,我在我的构建系统上遇到了类似的问题,而我的笔记本上有 ZSH 版本 5.0.2(其中 Unicode 按预期工作),而我的构建系统上有 ZSH 4.3.17。在我看来,ZSH 5 不存在正则表达式模式中 Unicode 字符的问题。
具体来说,解析键/值对:
我在两个 ZSH 中得到相同的结果版本。但我不知道为什么这封信是这里的问题。然而,它可能会帮助某人解决这个缺点。
]] echo "$match[1]:$match[2]"renders
另外,我假设 ZSH 4 Unicode 支持总体上存在一些缺点。
更新:经过一番调查,我发现正则表达式中的点与 ZSH 4 中的字母“č”不匹配。一旦我将模式更新为:
我在两个 ZSH 中得到相同的结果版本。但我不知道为什么这封信是这里的问题。然而,它可能会帮助某人解决这个缺点。
For some reason I got to similar problem on my build system, while having ZSH version 5.0.2 on my notebook (where Unicode works as expected) and ZSH 4.3.17 on my build system. It seems to me that ZSH 5 does not have the problem with Unicode characters in regular expression patterns.
Specifically, parsing the key/value pair:
I am getting the same result in both ZSH versions. I do not know, though, why exactly this letter is the problem here. However, it may help somebody to work this shortcoming around.
]] echo "$match[1]:$match[2]"renders
Also, I assume some shortcoming with ZSH 4 Unicode support in general.
Update: after some investigation, I have found out that the dot in regexp does not match the letter 'č' in ZSH 4. Once I updated the pattern to:
I am getting the same result in both ZSH versions. I do not know, though, why exactly this letter is the problem here. However, it may help somebody to work this shortcoming around.