如何使用 emacs 查找并删除文本文件中的不可见字符

发布于 2024-12-09 02:15:58 字数 672 浏览 8 评论 0原文

我有一个名为 COPYING 的 .txt 文件，该文件是在 Windows 上编辑的。
它包含 Windows 风格的换行符：

$ file COPYING 
COPYING: ASCII English text, with CRLF line terminators

我尝试使用 dos2unix 将其转换为 Unix 风格。下面是输出：

$ dos2unix COPYING 
dos2unix: Skipping binary file COPYING

我惊讶地发现 dos2unix 程序将其报告为二进制文件。然后使用其他编辑器（不是 Emacs）我发现该文件包含一个控制字符。我有兴趣使用 Emacs 查找文件中的所有不可见字符。

通过谷歌搜索，我找到了以下使用 tr 的解决方案：

tr -cd '\11\12\40-\176' < file_name

我怎样才能以 Emacs 方式做同样的事情？我尝试了 Hexl 模式。 Hexl 模式在单个缓冲区中显示文本及其相应的 ASCII 值，这非常棒。如何查找 ASCII 值除 11-12、40-176 之外的字符（即制表符、空格和可见字符）？我尝试为该搜索创建一个正则表达式，但它非常复杂。

原文

I have a .txt file named COPYING which is edited on windows.
It contains Windows-style line breaks :

$ file COPYING 
COPYING: ASCII English text, with CRLF line terminators

I tried to convert it to Unix style using dos2unix. Below is the output :

$ dos2unix COPYING 
dos2unix: Skipping binary file COPYING

I was surprised to find that the dos2unix program reports it as a binary file. Then using some other editor (not Emacs) I found that the file contains a control character. I am interested in finding all the invisible characters in the file using Emacs.

By googling, I have found the following solution which uses tr :

tr -cd '\11\12\40-\176' < file_name

How can I do the same in an Emacs way? I tried the Hexl mode. The Hexl mode shows text and their corresponding ASCII values in a single buffer which is great. How do I find the characters which have ASCII values other than 11-12, 40-176 (i.e tab, space, and visible characters)? I tried to create a regular expression for that search, but it is quite complicated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

肩上的翅膀 2024-12-16 02:15:58

要查看不可见字符，您可以尝试whitespace-mode。空格和制表符将以不同面的符号显示。如果编码系统被自动检测为 dos（状态栏上显示 (DOS)），则行尾的回车符也将被隐藏。运行 revert-buffer-with-coding-system 将其切换到 Unix 或二进制（例如 Cx RET r unix），它们将始终显示为 ^ M。二进制编码系统也将显示任何非 ASCII 字符作为控制字符。

回复收藏 0 原文

千纸鹤带着心事 2024-12-16 02:15:58

默认情况下，Emacs 不会隐藏任何字符。按 Ctrl+Meta+%，或 Esc，然后按 Ctrl+ %（如果前者对您的手指来说太难），或者如果您愿意的话，可以使用 Mx Replace-regexp RET 。然后，对于正则表达式，输入

[^@-^H^K-^_^?]

“但是，在我编写 ^H 的位置”，输入 Ctrl+Q，然后输入 Ctrl +H，按字面意思输入“control-H”字符，其他字符类似。您可以按 Ctrl+Q，然后按 Ctrl+Space 来表示 ^@，然后通常是 Ctrl+Q，然后 Backspace 表示 ^?。将所有出现此正则表达式的地方替换为空字符串。

由于您已在 Emacs 中打开该文件，因此您可以在使用该文件时更改其行结尾。按 Cx RET f (Ctrl+X Return F) 并输入 us-ascii-unix 作为文件的新所需编码。

Emacs won't hide any character by default. Press Ctrl+Meta+%, or Esc then Ctrl+% if the former is too hard on your fingers, or M-x replace-regexp RET if you prefer. Then, for the regular expression, enter

[^@-^H^K-^_^?]

However, where I wrote ^H, type Ctrl+Q then Ctrl+H, to enter a “control-H” character literally, and similarly for the others. You can press Ctrl+Q then Ctrl+Space for ^@, and usually Ctrl+Q then Backspace for ^?. Replace all occurrences of this regular expression by the empty string.

Since you have the file open in Emacs, you can change its line endings while you're at it. Press C-x RET f (Ctrl+X Return F) and enter us-ascii-unix as the new desired encoding for the file.

回复收藏 0 原文