Emacs hexl 模式 UTF8 BOM 问题

发布于 2024-12-04 03:03:59 字数 1238 浏览 1 评论 0原文

我在 Emacs(GNU Emacs 22.2.1 / Debian GNU Linux)下的 hexl-mode 遇到了一些奇怪的事情。

我有一个 UTF8 文本文件,我想向其中附加 BOM(字节顺序掩码:尽管不建议向 UTF8 文件附加无意义的 BOM,但规范明确规定 UTF8 文件中的 BOM 是 合法)。

以下是 file 命令如何查看该文件:

...$  file  /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode English text

工作原理如下:

open the UTF8 file (without BOM) in text mode
add three ASCII characters at the beginning of the file
close the file   (<-- see, very important, I need to close the file)
M-x hexl-mode
M-x hexl-find-file  (re-opening the file but this time in hexl-mode)
M-x hexl-insert-hex-string
EFBBBF
C-x C-s (saving the file)
M-x hexl-mode-exit

然后我得到一个带有 BOM 的 UTF-8 文件,如 file 命令所示:

...$  file  /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode (with BOM) English text

< em>(请注意,文件命令试探性地将其检测为带有 BOM“英文文本”的 UTF-8,但该文件确实包含大量欧元符号:我的观点是,在添加 BOM 之前,它不是 ASCII 文件,而是已经是 UTF-8文件,如上所示)

但是我根本无法在 Emacs 下首先打开该文件,然后调用hexl-mode,然后尝试用 0xEB 0xFF 替换前三个字符0xBF(BOM)然后保存。

显然,从(文本)切换到(十六进制)模式时会发生疯狂的转换问题。

我是否遗漏了一些明显的东西,或者在文本/十六进制之间转换有点损坏,我最好先切换到十六进制模式,进行十六进制编辑,然后保存并保存。关闭文件并以文本模式重新打开?

I ran into something a bit weird with the hexl-mode under Emacs (GNU Emacs 22.2.1 / Debian GNU Linux).

I had an UTF8 text file to which I wanted to append a BOM (Byte Order Mask: even though it is not recommended to append a pointless BOM to an UTF8 file, the spec clearly specify that a BOM in an UTF8 file is legal).

Here's how the file is seen by the file command:

...$  file  /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode English text

The following works:

open the UTF8 file (without BOM) in text mode
add three ASCII characters at the beginning of the file
close the file   (<-- see, very important, I need to close the file)
M-x hexl-mode
M-x hexl-find-file  (re-opening the file but this time in hexl-mode)
M-x hexl-insert-hex-string
EFBBBF
C-x C-s (saving the file)
M-x hexl-mode-exit

I then get an UTF-8 file with a BOM, as shown here by the file command:

...$  file  /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode (with BOM) English text

(note that the file command detects this heuristically as an UTF-8 with BOM "English text" but the file does contain a lot of Euro symbol: my point is that, before adding the BOM, it is NOT an ASCII file but already an UTF-8 file, as shown above)

However I simply cannot open the file under Emacs first then call hexl-mode then try to replace the first three characters by 0xEB 0xFF 0xBF (the BOM) and then save.

Apparently there are crazy conversion issues taking place when switching from (Text) to (Hexl) mode.

Am I missing something obvious or is converting to/from Text / Hexl a bit broken and I'm better to switch to hexl-mode first, do my hex editing then save & close the file and re-open in text mode?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我最亲爱的 2024-12-11 03:03:59

如果您查看 hexl-find-file 代码,您会发现它调用 find-file-literally,然后切换到 hexl-mode.

来自 find-file-literally 的文档

访问文件 FILENAME,不进行任何类型的转换。格式转换
和字符代码转换均被禁用,并且多字节
结果缓冲区中的字符被禁用。

因此,您可以使用 find-file-literally 添加 3 个字符来打开文件,然后切换到 hexl-mode

If you take a look on hexl-find-file code you will see that it calls find-file-literally and then switch to the hexl-mode.

From the documentation of find-file-literally

Visit file FILENAME with no conversion of any kind. Format conversion
and character code conversion are both disabled,and multibyte
characters are disabled in the resulting buffer.

So you may open your file with find-file-literally add 3 characters and then switch to the hexl-mode.

神爱温柔 2024-12-11 03:03:59

请注意,带有此标记的 xml 文件在保存时将自动转换为 utf-16 big endian。

<?xml version="1.0" encoding="UTF-16"?>

更改并保存后,这将自动生成带有 bom 的 utf8 文件:

<?xml version="1.0" encoding="UTF-8"?>

Note that an xml file with this tag will be silently converted to utf-16 big endian on saving.

<?xml version="1.0" encoding="UTF-16"?>

This would automatically make the file utf8 with bom after changing and saving it:

<?xml version="1.0" encoding="UTF-8"?>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文