LC_ALL=C 对加速 grep 的影响
我刚刚发现,如果我在 grep 命令前加上 LC_ALL=C 前缀,它会对加速 grep 产生奇迹。
但我想知道其中的含义。
使用 UTF-8 的模式会不匹配吗? 如果 grep 文件使用 UTF-8 会发生什么?
I just discovered that if i prefix my grep commands with a LC_ALL=C it does wonders for speeding grep up.
But i am wondering about the implications.
Would a pattern using UTF-8 not match?
What happens if the grepped file is using UTF-8?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您不一定需要 UTF-8 才能在这里遇到麻烦。区域设置负责设置字符类,即确定哪个字符是空格、字母或数字。考虑这两个例子:
当尝试相互匹配精确的二进制模式时,区域设置并没有什么区别,但是:
我不确定 grep 实现 unicode 的程度,以及不同代码点彼此匹配的程度,但匹配 ASCII 的任何子集以及匹配单个字符而无需替代二进制表示形式应该可以正常工作,无论区域设置如何。
You don't necessarily need UTF-8 to run into trouble here. The locale is responsible for setting the character classes, i.e. determining which character is a space, a letter or a digit. Consider these two examples:
When trying to match exact binary patterns against each other, the locale doesn't make a difference, however:
I'm not sure about the extent of grep implementing unicode, and how well different codepoints are matched to each other, but matching any subset of ASCII and the matching of single characters without alternate binary representations should work fine regardless of locale.