为什么 VIM 忽略我的文件的 BOM?
我需要一个要确保使用 utf8 编码的文件。
因此,我在 VIM 中创建文件,
c:\> gvim umlaute.txt
输入变音符号:
äöü
我检查编码...
:set enc
(VIM 回显 encoding=latin1
)
,然后检查文件编码...
:set fenc
(VIM 回显 fileencoding=
)
然后我写入文件
:w
并检查硬盘上文件的大小:(
!dir umlaute.txt
大小为 5 字节)这当然是预期的,3 字节用于文本,2 字节用于 \x0a \x0d。
好的,所以我现在将编码设置为
:set enc=utf8
缓冲区变得很奇怪
<e4><f6><fc>
我猜这是我之前输入的 ascii 字符的十六进制表示。所以我重写它们
äöü
写入,检查大小:
:w
:$ dir umlaute.txt
这次,它是 8字节。我想每个字符 2 个字节加上 \x0d \x0a 是有意义的。
好的,所以我想确保下次打开文件时它将使用 encodiung=utf8 打开。
:setb
:w
:$ dir umlaute.txt
11 字节。这当然是 8 个(之前的)字节 + BOM 的 3 个字节 (ef bb bf)。
所以我用
:quit
vim 再次打开文件
并检查编码是否已设置:
:set enc
但 VIM 坚持其 encoding=latin1
。
那么,这是为什么呢。我本来希望 BOM 告诉 VIM 这是一个 UTF8 文件。
I need a file that I want to make sure is encoded with utf8.
So, I create the file
c:\> gvim umlaute.txt
In VIM I type the Umlaute:
äöü
I check the encoding ...
:set enc
(VIM echoes encoding=latin1
)
and then I check the file encoding ...
:set fenc
(VIM echoes fileencoding=
)
Then I write the file
:w
And check the file's size on the harddisk:
!dir umlaute.txt
(The size is 5 bytes) That is of course expected, 3 bytes for the text and 2 for the \x0a \x0d.
Ok, so I now set the encoding to
:set enc=utf8
The buffer get's wierd
<e4><f6><fc>
I guess this is the hex representation of the ascii characters I previously typed in. So I rewrite them
äöü
Writing, checking size:
:w
:$ dir umlaute.txt
This time, it's 8 bytes. I guess that makes sense 2 bytes for every character plus \x0d \x0a.
Ok, so I want to make sure the next time I open the file it will be opened with encodiung=utf8.
:setb
:w
:$ dir umlaute.txt
11 Bytes. This is of course 8 (previous) Bytes + 3 Bytes for the BOM (ef bb bf).
So I
:quit
vim and open the file again
and check, if the encoding is set:
:set enc
But VIM insists its encoding=latin1
.
So, why is that. I would have expected the BOM to tell VIM that this is a UTF8 file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您混淆了
'encoding'
(Vim 全局设置)和'fileencoding'
(每个缓冲区的本地设置)。打开文件时,变量
'fileencodings'
(注意最后的 s)决定 Vim 将尝试使用什么编码打开文件。如果它以 ucs-bom 开头,那么任何带有 BOM 的文件如果解析正确,都将被正确打开。如果要更改文件的编码,应使用
:set fenc=
。如果您想删除 BOM,您应该使用:set [no]bomb
。然后使用:w
保存。避免在打开缓冲区后更改
enc
,这可能会弄乱事情。enc
决定了 vim 可以使用哪些字符,它与你正在使用的文件无关。细节
您正在打开 vim,但文件名不存在。 Vim 创建一个缓冲区,为其指定名称,并将 fenc 设置为空值,因为没有与其关联的文件。
这意味着 Vim 将缓冲区内容存储在 ISO-8859-1(可能是另一个数字)中。
这是正常的,暂时没有文件。
由于
'fileencoding'
为空,它将使用内部编码latin1
将其写入磁盘。错误! 你告诉 vim 它必须将缓冲区内容解释为 UTF8 内容。缓冲区包含十六进制的
e4 f6 fc 0a 0d
,前三个字节是无效的UTF8字符序列。您应该输入:set fenc=utf-8
。这会转换缓冲区。这就是当你强制 Vim 将非法 UTF-8 文件解释为 UTF8 时会发生的情况。
您应该运行
set fenc?
来了解检测到的文件编码是什么。如果你希望 Vim 能够使用 Unicode 文件,你应该在 vimrc 中设置'enc'
为 utf-8。You are confusing
'encoding'
which is a Vim global setting, and'fileencoding'
, which is a local setting to each buffer.When opening a file, the variable
'fileencodings'
(note the final s) determines what encodings Vim will try to open the file with. If it starts withucs-bom
then any file with a BOM will be properly opened if it parses correctly.If you want to change the encoding of a file, you should use
:set fenc=<foo>
. If you want to remove the BOM you should use:set [no]bomb
. Then use:w
to save.Avoid changing
enc
after having opened a buffer, it could mess up things.enc
determines what characters vim can work with, and it has nothing to do with the files that you are working with.Details
You are opening vim, with a nonexistent file name. Vim creates a buffer, gives it that name, and sets
fenc
to an empty value since there is no file associated with it.This means that the Vim stores the buffer contents in ISO-8859-1 (maybe another number).
This is normal, there is no file for the moment.
Since
'fileencoding'
is empty, it will write it to the disk using the internal encoding,latin1
.WRONG! You are telling vim that it must interpret the buffer contents as UTF8 content. the buffer contains, in hexadecimal,
e4 f6 fc 0a 0d
, the first three bytes are invalid UTF8 character sequences. You should have typed:set fenc=utf-8
. This would have converted the buffer.That's what happens when you force Vim to interpret an illegal UTF-8 file as UTF8.
You should run
set fenc?
to know what is the detected encoding of your file. And if you want Vim to be able to work with Unicode files, you should set in your vimrc that'enc'
is utf-8.经过多次尝试,我得到了一个工作示例:
如果您想使用 BOM 创建新字段:
它现在可以工作了!
After many attempts I get here is a working example:
and if you want to cteate new fiel with BOM:
it is working now!
:helpomb
显示以下信息:所以尝试在你的 .vimrc 中设置它:
:help bomb
reveals the following information:So try setting this in your .vimrc: