(e)grep:重音字符不被识别为单词的一部分
我想使用 (e)grep 使用 -w
开关来匹配整个单词。我已经设置了区域设置,但重音字符被视为单词边界,如下例所示:
$ locale
LANG=es_VE.utf8
LC_CTYPE="es_VE.utf8"
LC_NUMERIC="es_VE.utf8"
LC_TIME="es_VE.utf8"
LC_COLLATE="es_VE.utf8"
LC_MONETARY="es_VE.utf8"
LC_MESSAGES="es_VE.utf8"
LC_ALL=es_VE.utf8
$ echo -e "cáñamo\namo" | egrep -w amo
cáñamo
amo
How can I find amo
while ignoring cáñamo
I would like to use (e)grep to match a whole word using the -w
switch. I've set the locale, but accented characters are being treated as word boundaries as in this example:
$ locale
LANG=es_VE.utf8
LC_CTYPE="es_VE.utf8"
LC_NUMERIC="es_VE.utf8"
LC_TIME="es_VE.utf8"
LC_COLLATE="es_VE.utf8"
LC_MONETARY="es_VE.utf8"
LC_MESSAGES="es_VE.utf8"
LC_ALL=es_VE.utf8
$ echo -e "cáñamo\namo" | egrep -w amo
cáñamo
amo
How can I find amo
while ignoring cáñamo
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 Unicode 中,哪些代码点算作单词类字符与区域设置无关,并且
LATIN SMALL LETTER N WITH TILDE
始终是单词字符。下面是一个全 UTF8 工作流程,演示了在单词边界之后和非(单词边界)之后搜索
amo
:我不禁对您选择的搜索字符串感到好笑。谢谢你的笑声。
Which code points count as a word-class character is not locale-dependent in Unicode, and
LATIN SMALL LETTER N WITH TILDE
is always a word character.Here’s an all-UTF8 workflow demonstrating searching for
amo
after a word boundary, and after a non-(word-boundary):I cannot help but be amused by your choice of search strings. Thanks for the chuckle.