在 Emacs 中突出显示和替换不可打印的 unicode 字符

发布于 2024-12-06 11:44:04 字数 399 浏览 0 评论 0原文

我有一个 UTF-8 文件,其中包含一些 Unicode 字符,例如 LEFT-TO-RIGHT OVERRIDE (U+202D),我想将其从文件中删除。在 Emacs 中,它们默认是隐藏的(这应该是正确的行为?)。如何使此类“异国情调”unicode 字符可见(同时不更改“常规”unicode 字符(如德语元音变音)的显示)?之后如何替换它们(例如,使用 replace-stringCX 8 Ret 不适用于 isearch/replace-string)。

在 Vim 中,这非常简单:这些字符默认以十六进制表示形式显示(这是错误还是缺少功能?),您可以使用 :%s/\%u202d//g 轻松删除它们> 例如。这应该可以用 Emacs 实现吗?

I have an UTF-8 file containing some Unicode characters like LEFT-TO-RIGHT OVERRIDE (U+202D) which I want to remove from the file. In Emacs, they are hidden (which should be the correct behavior?) by default. How do I make such "exotic" unicode characters visible (while not changing display of "regular" unicode characters like german umlauts)? And how do I replace them afterwards (with replace-string for example. C-X 8 Ret does not work for isearch/replace-string).

In Vim, its quite easy: These characters are displayed with their hex representation per default (is this a bug or missing feature?) and you can easily remove them with :%s/\%u202d//g for example. This should be possible with Emacs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

国粹 2024-12-13 11:44:04

您可以执行 Mx find-file-literally 然后您将看到这些字符。

然后你可以使用通常的string-replace删除它们

You can do M-x find-file-literally then you will see these characters.

Then you can remove them using usual string-replace

亚希 2024-12-13 11:44:04

怎么样:

通过输入 M-:(kill-new "\u202d") 将要匹配的 U+202d 字符放在 Kill Ring 的顶部。然后,您可以使用 Cy (例如 query-replace)或 My (例如 >isearch-forward)。

(编辑添加:)

您也可以仅以非交互方式调用命令,这不会出现与交互调用相同的键盘输入困难。例如,输入 M-:,然后:

(replace-string "\u202d" "")

这与您的 Vim 版本有些相似。一个区别是,它仅执行从光标位置到文件底部(或缩小区域)的替换,因此您需要先转到文件顶部(或缩小区域),然后再运行命令来替换所有内容匹配。

How about this:

Put the U+202d character you want to match at the top of the kill ring by typing M-:(kill-new "\u202d"). Then you can yank that string into the various searching commands, with either C-y (eg. query-replace) or M-y (eg. isearch-forward).

(Edited to add:)

You could also just call commands non-interactively, which doesn't present the same keyboard-input difficulties as the interactive calls. For example, type M-: and then:

(replace-string "\u202d" "")

This is somewhat similar to your Vim version. One difference is that it only performs replacements from the cursor position to the bottom of the file (or narrowed region), so you'd need to go to the top of the file (or narrowed region) prior to running the command to replace all matches.

马蹄踏│碎落叶 2024-12-13 11:44:04

我也有这个问题,这对于提交来说特别烦人,因为当人们注意到错误时修复日志消息可能为时已晚。因此,我修改了输入 Cx Cc 时使用的函数,以检查是否存在不可打印的字符,即匹配 "[^\n[:print:]]"< /code>,如果有,则将光标放在其上,输出一条消息,并且不杀死缓冲区。然后可以手动删除该字符,将其替换为可打印的字符,或者其他任何方式,具体取决于上下文。

用于检测的代码(并将光标定位在不可打印字符之后)是:

(progn
  (goto-char (point-min))
  (re-search-forward "[^\n[:print:]]" nil t))

注意:

  • 不需要保存当前光标位置,因为在这里,缓冲区将被杀死,或者光标将被放置在故意使用不可打印的字符。
  • 您可能需要稍微修改正则表达式。例如,制表符是不可打印的字符,我认为它是不可打印的字符,但您可能也想接受它。
  • 关于正则表达式中的 [:print:] 字符类,您依赖于 C 库。一些可打印的字符可能被视为不可打印,例如最近的一些表情符号(但不是每个人都关心)。
  • 当且仅当存在不可打印字符时,re-search-forward 返回值才会被视为 true。这正是我们想要的。

下面是我用于 Subversion 提交的片段(这是在我的 .emacs 中更复杂的代码之间)。

(defvar my-svn-commit-frx "/svn-commit\\.\\([0-9]+\\.\\)?tmp\\'")

    ((and (buffer-file-name)
          (string-match my-svn-commit-frx (buffer-file-name))
          (progn
            (goto-char (point-min))
            (re-search-forward "[^\n[:print:]]" nil t)))
     (backward-char)
     (message "The buffer contains a non-printable character."))

cond 中,即我仅将此规则应用于用于 Subversion 提交的文件名。可以使用或不使用 (backward-char),具体取决于您希望光标位于不可打印字符上方还是紧接在不可打印字符之后。

I also have this issue, and this is particularly annoying for commits as it may be too late to fix the log message when one notices the mistake. So I've modified the function I use when I type C-x C-c to check whether there is a non-printable character, i.e. matching "[^\n[:print:]]", and if there is one, put the cursor over it, output a message, and do not kill the buffer. Then it is possible to manually remove the character, replace it by a printable one, or whatever, depending on the context.

The code to use for the detection (and positioning the cursor after the non-printable character) is:

(progn
  (goto-char (point-min))
  (re-search-forward "[^\n[:print:]]" nil t))

Notes:

  • There is no need to save the current cursor position since here, either the buffer will be killed or the cursor will be put over the non-printable character on purpose.
  • You may want to slightly modify the regexp. For instance, the tab character is a non-printable character and I regard it as such, but you may also want to accept it.
  • About the [:print:] character class in the regexp, you are dependent on the C library. Some printable characters may be regarded as non-printable, like some recent emojis (but not everyone cares).
  • The re-search-forward return value will be regarded as true if and only if there is a non-printable character. This is exactly what we want.

Here's a snippet of what I use for Subversion commits (this is between more complex code in my .emacs).

(defvar my-svn-commit-frx "/svn-commit\\.\\([0-9]+\\.\\)?tmp\\'")

and

    ((and (buffer-file-name)
          (string-match my-svn-commit-frx (buffer-file-name))
          (progn
            (goto-char (point-min))
            (re-search-forward "[^\n[:print:]]" nil t)))
     (backward-char)
     (message "The buffer contains a non-printable character."))

in a cond, i.e. I apply this rule only on filenames used for Subversion commits. The (backward-char) can be used or not, depending on whether you want the cursor to be over or just after the non-printable character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文