在 Emacs 中突出显示和替换不可打印的 unicode 字符
我有一个 UTF-8 文件,其中包含一些 Unicode 字符,例如 LEFT-TO-RIGHT OVERRIDE (U+202D),我想将其从文件中删除。在 Emacs 中,它们默认是隐藏的(这应该是正确的行为?)。如何使此类“异国情调”unicode 字符可见(同时不更改“常规”unicode 字符(如德语元音变音)的显示)?之后如何替换它们(例如,使用 replace-string
。CX 8 Ret
不适用于 isearch/replace-string
)。
在 Vim 中,这非常简单:这些字符默认以十六进制表示形式显示(这是错误还是缺少功能?),您可以使用 :%s/\%u202d//g
轻松删除它们> 例如。这应该可以用 Emacs 实现吗?
I have an UTF-8 file containing some Unicode characters like LEFT-TO-RIGHT OVERRIDE (U+202D) which I want to remove from the file. In Emacs, they are hidden (which should be the correct behavior?) by default. How do I make such "exotic" unicode characters visible (while not changing display of "regular" unicode characters like german umlauts)? And how do I replace them afterwards (with replace-string
for example. C-X 8 Ret
does not work for isearch/replace-string
).
In Vim, its quite easy: These characters are displayed with their hex representation per default (is this a bug or missing feature?) and you can easily remove them with :%s/\%u202d//g
for example. This should be possible with Emacs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以执行
Mx find-file-literally
然后您将看到这些字符。然后你可以使用通常的
string-replace
删除它们You can do
M-x find-file-literally
then you will see these characters.Then you can remove them using usual
string-replace
怎么样:
通过输入 M-:
(kill-new "\u202d")
将要匹配的 U+202d 字符放在 Kill Ring 的顶部。然后,您可以使用 Cy (例如query-replace
)或 My (例如>isearch-forward
)。(编辑添加:)
您也可以仅以非交互方式调用命令,这不会出现与交互调用相同的键盘输入困难。例如,输入 M-:,然后:
这与您的 Vim 版本有些相似。一个区别是,它仅执行从光标位置到文件底部(或缩小区域)的替换,因此您需要先转到文件顶部(或缩小区域),然后再运行命令来替换所有内容匹配。
How about this:
Put the U+202d character you want to match at the top of the kill ring by typing M-:
(kill-new "\u202d")
. Then you can yank that string into the various searching commands, with either C-y (eg.query-replace
) or M-y (eg.isearch-forward
).(Edited to add:)
You could also just call commands non-interactively, which doesn't present the same keyboard-input difficulties as the interactive calls. For example, type M-: and then:
This is somewhat similar to your Vim version. One difference is that it only performs replacements from the cursor position to the bottom of the file (or narrowed region), so you'd need to go to the top of the file (or narrowed region) prior to running the command to replace all matches.
我也有这个问题,这对于提交来说特别烦人,因为当人们注意到错误时修复日志消息可能为时已晚。因此,我修改了输入
Cx Cc
时使用的函数,以检查是否存在不可打印的字符,即匹配"[^\n[:print:]]"< /code>,如果有,则将光标放在其上,输出一条消息,并且不杀死缓冲区。然后可以手动删除该字符,将其替换为可打印的字符,或者其他任何方式,具体取决于上下文。
用于检测的代码(并将光标定位在不可打印字符之后)是:
注意:
[:print:]
字符类,您依赖于 C 库。一些可打印的字符可能被视为不可打印,例如最近的一些表情符号(但不是每个人都关心)。re-search-forward
返回值才会被视为 true。这正是我们想要的。下面是我用于 Subversion 提交的片段(这是在我的
.emacs
中更复杂的代码之间)。在
cond
中,即我仅将此规则应用于用于 Subversion 提交的文件名。可以使用或不使用(backward-char)
,具体取决于您希望光标位于不可打印字符上方还是紧接在不可打印字符之后。I also have this issue, and this is particularly annoying for commits as it may be too late to fix the log message when one notices the mistake. So I've modified the function I use when I type
C-x C-c
to check whether there is a non-printable character, i.e. matching"[^\n[:print:]]"
, and if there is one, put the cursor over it, output a message, and do not kill the buffer. Then it is possible to manually remove the character, replace it by a printable one, or whatever, depending on the context.The code to use for the detection (and positioning the cursor after the non-printable character) is:
Notes:
[:print:]
character class in the regexp, you are dependent on the C library. Some printable characters may be regarded as non-printable, like some recent emojis (but not everyone cares).re-search-forward
return value will be regarded as true if and only if there is a non-printable character. This is exactly what we want.Here's a snippet of what I use for Subversion commits (this is between more complex code in my
.emacs
).and
in a
cond
, i.e. I apply this rule only on filenames used for Subversion commits. The(backward-char)
can be used or not, depending on whether you want the cursor to be over or just after the non-printable character.