Grep 正则表达式无法与 Windows 上的 Cygwin 一起使用
I'm trying to find all non ascii chars in a file using grep:
grep '[^\x00-\x7F]' myfile
I think this should work but it returns each row in the file.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
grep
无法识别\x
语法。打印
(
'H'
是字符 0x48。)您的
grep
匹配所有行,因为每行都包含\
、以外的字符x
、0
、7
、F
以及0
范围内的任何内容 ..\。
请注意,这并非特定于 Cygwin。
GNU grep(Cygwin 拥有的)有一个实验性的
-P
选项,告诉它使用类似 Perl 的正则表达式;使用该选项,它确实可以识别\x
语法。grep
doesn't recognize the\x
syntax.prints
(
'H'
is character 0x48.)Your
grep
is matching all lines because each line contains a character other than\
,x
,0
,7
,F
, and anything in the range0
..\
.Note that this is not specific to Cygwin.
GNU grep (which is what Cygwin has) has an experimental
-P
option that tells it to use Perl-like regular expressions; with that option, it does recognize the\x
syntax.发现perl可以工作:
perl -n -e 'print if /[^\x00-\x7F]/' file
Found that perl works:
perl -n -e 'print if /[^\x00-\x7F]/' file
Grep 可能会将多字节(即非 ASCII)字符解释为几个单字节(ASCII)字符。 (这样,这个可爱的
∩
字符 [U+2229] 将显示为"
[U+0022] 后跟一个)
[U +0029]。)您需要弄清楚文件的编码并使用了解 Unicode 的更复杂的系统。Grep may be interpreting multibyte (i.e., non-ASCII) characters as several single-byte (ASCII) characters. (This way, this lovely
∩
character [U+2229] would show up as"
[U+0022] followed by a)
[U+0029].) You'll need to figure out the file's encoding and use a more-sphisticated system that knows Unicode.