在文件层次结构中搜索所选编码系统中的文本

发布于 2025-01-05 06:27:06 字数 437 浏览 3 评论 0原文

我想在文件层次结构中搜索指定编码系统（cp1251/UTF-8/UTF-16-le/iso-8859-4 等）中的文本。

例如，我有 cp1251 编码的源代码，并且我使用系统编码 UTF-8 运行 Debian。 grep 或 Midnight Commander 以 UTF-8 编码执行搜索。所以我找不到俄语单词。

首选解决方案将使用标准 POSIX 或 GNU 命令行实用程序（例如 grep）。

MC 或 Emacs 解决方案也很受欢迎。

我尝试过：

$ grep `echo Привет | iconv -f cp1251 -t utf-8` *

但是这个命令有时不显示结果。

原文

I want to search for text in a specified coding system (cp1251/UTF-8/UTF-16-le/iso-8859-4, etc) in a file hierarchy.

For example I have source code in cp1251 coding and I run Debian with system coding UTF-8. grep or Midnight Commander perform searches in UTF-8 coding. So I can not find Russian words.

Preferred solutions will use standard POSIX or GNU command line utilities (like grep).

MC or Emacs solution also appreciated.

I tried:

$ grep `echo Привет | iconv -f cp1251 -t utf-8` *

but this command does not show results sometimes.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

原谅过去的我 2025-01-12 06:27:07

您建议的命令输出字符串 Привет，然后将该输出的结果通过管道传递给 iconv 并将 grep 应用于 iconv 的结果。那不是你想要的。您想要的是这样的：

find . -type f -printf "iconv -f cp1251 -t utf-8 '%p' | grep --label '%p' -H 'Привет'\n" | sh

这将 iconv 和 grep 应用于当前目录下的每个文件。

但请注意，这假设您的所有文件都位于 CP1251 中。如果只有其中的一些，它就会失败。在这种情况下，您首先必须编写一个程序来检测文件的编码，然后仅在必要时应用 iconv。

The command you proposed outputs the string Привет, then pipes the result of that output to iconv and applies grep to the result of iconv. That is not what you want. What you want is this:

find . -type f -printf "iconv -f cp1251 -t utf-8 '%p' | grep --label '%p' -H 'Привет'\n" | sh

This applies iconv, followed by grep, to every file below the current directory.

But note that this assumes that all of your files are in CP1251. It will fail if only some of them are. In that case you'd first have to write a program that detects the encoding of a file and then applies iconv only if necessary.

回复收藏 0 原文