如何设置 shell-command-on-region 输出的编码？

发布于 2024-08-13 01:29:50 字数 1064 浏览 9 评论 0原文

我有一个小的 elisp 脚本，它将 Perl::Tidy 应用于区域或整个文件。作为参考，这里是脚本（借自 EmacsWiki）：

(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start 
                       end 
                       "perltidy" 
                       t
                       t
                       (get-buffer-create "*Perltidy Output*")))

(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
    (setq start (region-beginning)
          end (region-end))
  (setq start (point-min)
        end (point-max)))
(perltidy-command start end)
(goto-char point)))

(global-set-key "\C-ct" 'perltidy-dwim)

我正在使用当前的 Windows 版 Emacs 23.1 (EmacsW32)。我遇到的问题是，如果我将该脚本应用于 UTF-8 编码文件（状态栏中的“U(Unix)”），输出将返回 Latin-1 编码，即每个非字符有两个或更多字符ASCII 源字符。

我有什么办法可以解决这个问题吗？

编辑：问题似乎可以通过在我的 init.el 中使用 (set-terminal-coding-system 'utf-8-unix) 来解决。如果有人有其他解决方案，请继续写下去！

原文

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):

(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start 
                       end 
                       "perltidy" 
                       t
                       t
                       (get-buffer-create "*Perltidy Output*")))

(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
    (setq start (region-beginning)
          end (region-end))
  (setq start (point-min)
        end (point-max)))
(perltidy-command start end)
(goto-char point)))

(global-set-key "\C-ct" 'perltidy-dwim)

I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.

Is there any way I can fix that?

EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黄昏下泛黄的笔记 2024-08-20 01:29:50

以下来自shell-command-on-region文档

To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command.  By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.

执行过程中，首先从process-coding-system-alist中查找编码系统，如果为nil，则查找来自默认进程编码系统。

如果您想更改编码，您可以将转换选项添加到process-coding-system-alist，以下是其内容。

Value: (("\\.dz\\'" no-conversion . no-conversion)
 ...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
 ...
("" undecided))

或者，如果您没有设置 process-coding-system-alist，则它为零，您可以将编码选项分配给 default-process-coding-system，

例如：（

(setq default-process-coding-system '(utf-8 . utf-8))

如果输入编码为utf-8，则输出编码为utf-8）

或者

(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))

我还写了一个帖子如果您想了解详细信息。

Below are from shell-command-on-region document

To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command.  By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.

During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.

If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.

Value: (("\\.dz\\'" no-conversion . no-conversion)
 ...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
 ...
("" undecided))

Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,

for example:

(setq default-process-coding-system '(utf-8 . utf-8))

(If input is encoded as utf-8, then output encoded as utf-8)

(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))

I also wrote a post about this if you want details.

回复收藏 0 原文

青朷 2024-08-20 01:29:50

引用 shell-command-on-region 的文档 (Ch f shell-command-on-region RET)：

指定转换非 ASCII 字符的编码系统
在 shell 命令的输入和输出中，使用 Cx RET c
在此命令之前。默认情况下，输入（来自当前缓冲区）
使用用于保存文件的相同编码系统进行编码，
“缓冲区文件编码系统”。如果输出要替换该区域，
然后从同一编码系统对其进行解码。
非交互式参数是 START、END、COMMAND、
输出缓冲区、替换缓冲区、错误缓冲区和显示错误缓冲区。
非交互式调用者可以通过绑定指定编码系统
“读编码系统”和“写编码系统”。

换句话说，你会做这样的事情

(let ((coding-system-for-read 'utf-8-unix))
  (shell-command-on-region ...) )

This is untested，不确定 coding-system-for-read 的值是什么（或者可能是 -write ？或者作为好吧？）应该适合你的情况。我想您还可以利用 OUTPUT-BUFFER 参数并将输出定向到一个缓冲区，该缓冲区的编码系统设置为您需要的值。

另一种选择可能是在 perltidy 调用中调整区域设置，但同样，由于没有有关您现在使用的内容的更多信息，并且无法在与您类似的系统上进行实验，我只能暗示。

Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):

To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded in the same coding system that will be used to save the file,
`buffer-file-coding-system'. If the output is going to replace the region,
then it is decoded from that same coding system.
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
Noninteractive callers can specify coding systems by binding
`coding-system-for-read' and `coding-system-for-write'.

In other words, you'd do something like

(let ((coding-system-for-read 'utf-8-unix))
  (shell-command-on-region ...) )

This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.

Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.

回复收藏 0 原文

~没有更多了~