自动检测字符编码 (UTF-16) 的 findstr 或 grep

发布于 2024-07-10 21:24:05 字数 283 浏览 13 评论 0原文

我想这样做：

 findstr /s /c:some-symbol *

或 grep 等效项

 grep -R some-symbol *

，但我需要该实用程序来自动检测以 UTF-16 （和朋友）编码的文件并适当地搜索它们。我的文件中甚至有字节排序标记 FFEE，所以我什至不寻找英雄自动检测。

有什么建议么？

我指的是Windows Vista 和XP。

原文

I want to do this:

 findstr /s /c:some-symbol *

or the grep equivalent

 grep -R some-symbol *

but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection.

Any suggestions?

I'm referring to Windows Vista and XP.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

西瑶 2024-07-17 21:24:05

解决方法是将 UTF-16 转换为 ASCII 或 ANSI

TYPE UTF-16.txt > ASCII.txt

然后您可以使用 FINDSTR。

FINDSTR object ASCII.txt

A workaround is to convert your UTF-16 to ASCII or ANSI

TYPE UTF-16.txt > ASCII.txt

Then you can use FINDSTR.

FINDSTR object ASCII.txt

回复收藏 0 原文

不知所踪 2024-07-17 21:24:05

感谢您的建议。我指的是 Windows Vista 和 XP。

我还发现了这个解决方法，使用免费的 Sysinternals strings.exe：

C:\> strings -s -b dir_tree_to_search | grep regexp

Strings.exe 提取它找到的所有字符串（从二进制文件中提取，但也适用于文本文件）并在每个结果前面添加文件名和冒号，因此请考虑到这一点在正则表达式中（或使用 cut 或管道中的其他步骤）。 -s 使其执行递归提取，而 -b 只是抑制横幅消息。

最终，我仍然对旗舰搜索实用程序 Gnu grep 和 findstr 本身不处理 Unicode 字符编码感到惊讶。

Thanks for the suggestions. I was referring to Windows Vista and XP.

I also discovered this workaround, using free Sysinternals strings.exe:

C:\> strings -s -b dir_tree_to_search | grep regexp

Strings.exe extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s makes it do a recursive extraction and -b just suppresses the banner message.

Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep and findstr don't handle Unicode character encodings natively.

回复收藏 0 原文

蓝海 2024-07-17 21:24:05

在 Windows 上，您还可以使用 find.exe。

find /i /n "YourSearchString" *.*

唯一的问题是这会打印文件名，后跟匹配项。您可以通过管道到 findstr 来过滤它们

find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"

On Windows, you can also use find.exe.

find /i /n "YourSearchString" *.*

The only problem is this prints file names followed by matches. You may filter them by piping to findstr

find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"

回复收藏 0 原文

把昨日还给我 2024-07-17 21:24:05

findstr /s /c:some-symbol *

可以替换为以下字符编码感知命令：

for /r %f in (*) do @find /i /n "some-symbol" "%f"

findstr /s /c:some-symbol *

can be replaced with the following character encoding aware command:

for /r %f in (*) do @find /i /n "some-symbol" "%f"

回复收藏 0 原文

私野 2024-07-17 21:24:05

根据Damon Cortesi的博客文章，grep不适用于UTF -16 个文件，如您所知。然而，它提出了这种解决方法：

for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
        do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done

这显然是针对 Unix 的，不确定 Windows 上的等效项是什么。该文章的作者还提供了一个 shell 脚本来执行上述操作，您可以在 github 此处找到该脚本。

这仅greps UTF-16 文件。您还可以按正常方式 grep ASCII 文件。

According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:

for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
        do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done

This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.

This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.

回复收藏 0 原文