Emacs hexl 模式 UTF8 BOM 问题
我在 Emacs(GNU Emacs 22.2.1 / Debian GNU Linux)下的 hexl-mode 遇到了一些奇怪的事情。
我有一个 UTF8 文本文件,我想向其中附加 BOM(字节顺序掩码:尽管不建议向 UTF8 文件附加无意义的 BOM,但规范明确规定 UTF8 文件中的 BOM 是 合法)。
以下是 file 命令如何查看该文件:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode English text
工作原理如下:
open the UTF8 file (without BOM) in text mode
add three ASCII characters at the beginning of the file
close the file (<-- see, very important, I need to close the file)
M-x hexl-mode
M-x hexl-find-file (re-opening the file but this time in hexl-mode)
M-x hexl-insert-hex-string
EFBBBF
C-x C-s (saving the file)
M-x hexl-mode-exit
然后我得到一个带有 BOM 的 UTF-8 文件,如 file 命令所示:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode (with BOM) English text
< em>(请注意,文件命令试探性地将其检测为带有 BOM“英文文本”的 UTF-8,但该文件确实包含大量欧元符号:我的观点是,在添加 BOM 之前,它不是 ASCII 文件,而是已经是 UTF-8文件,如上所示)
但是我根本无法在 Emacs 下首先打开该文件,然后调用hexl-mode,然后尝试用 0xEB 0xFF 替换前三个字符0xBF(BOM)然后保存。
显然,从(文本)切换到(十六进制)模式时会发生疯狂的转换问题。
我是否遗漏了一些明显的东西,或者在文本/十六进制之间转换有点损坏,我最好先切换到十六进制模式,进行十六进制编辑,然后保存并保存。关闭文件并以文本模式重新打开?
I ran into something a bit weird with the hexl-mode under Emacs (GNU Emacs 22.2.1 / Debian GNU Linux).
I had an UTF8 text file to which I wanted to append a BOM (Byte Order Mask: even though it is not recommended to append a pointless BOM to an UTF8 file, the spec clearly specify that a BOM in an UTF8 file is legal).
Here's how the file is seen by the file command:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode English text
The following works:
open the UTF8 file (without BOM) in text mode
add three ASCII characters at the beginning of the file
close the file (<-- see, very important, I need to close the file)
M-x hexl-mode
M-x hexl-find-file (re-opening the file but this time in hexl-mode)
M-x hexl-insert-hex-string
EFBBBF
C-x C-s (saving the file)
M-x hexl-mode-exit
I then get an UTF-8 file with a BOM, as shown here by the file command:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode (with BOM) English text
(note that the file command detects this heuristically as an UTF-8 with BOM "English text" but the file does contain a lot of Euro symbol: my point is that, before adding the BOM, it is NOT an ASCII file but already an UTF-8 file, as shown above)
However I simply cannot open the file under Emacs first then call hexl-mode then try to replace the first three characters by 0xEB 0xFF 0xBF (the BOM) and then save.
Apparently there are crazy conversion issues taking place when switching from (Text) to (Hexl) mode.
Am I missing something obvious or is converting to/from Text / Hexl a bit broken and I'm better to switch to hexl-mode first, do my hex editing then save & close the file and re-open in text mode?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您查看
hexl-find-file
代码,您会发现它调用find-file-literally
,然后切换到hexl-mode.
来自
find-file-literally
的文档因此,您可以使用
find-file-literally
添加 3 个字符来打开文件,然后切换到hexl-mode
。If you take a look on
hexl-find-file
code you will see that it callsfind-file-literally
and then switch to thehexl-mode
.From the documentation of
find-file-literally
So you may open your file with
find-file-literally
add 3 characters and then switch to thehexl-mode
.请注意,带有此标记的 xml 文件在保存时将自动转换为 utf-16 big endian。
更改并保存后,这将自动生成带有 bom 的 utf8 文件:
Note that an xml file with this tag will be silently converted to utf-16 big endian on saving.
This would automatically make the file utf8 with bom after changing and saving it: