我有一个只包含这两行的 ruby 文件:
# encoding: utf-8
puts "—"
当我使用 ruby test_enc.rb
运行它时,它会失败:
test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file
我不知道如何正确指定 -
的字符代码code> (emdash),但 vim 告诉我它是 151,十六进制 97,八进制 227
。对于像 ã
这样的其他字符,它也会以同样的方式失败,所以我怀疑它是否与该字符特别相关。
我在 Windows XP 上运行,我使用的 ruby 版本是:
ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]
我觉得这里缺少一些非常明显的东西。有什么想法吗?
编辑:今天学到了关于假设的宝贵教训 - 特别是假设您的编辑器正在使用 UTF-8 而没有实际检查它。哎呀!
感谢大家快速而准确的回复!
再次编辑:“为utf-8正确设置vim”变得太大,与这个问题并不真正相关,所以它现在是一个单独的问题。
I have a ruby file with only these two lines:
# encoding: utf-8
puts "—"
When I run it with ruby test_enc.rb
it fails with:
test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file
I don't know how to properly specify the character code of —
(emdash), but vim tells me it is 151, Hex 97, Octal 227
. It fails the same way with other characters like ã
as well, so I doubt it is related specifically to that character.
I am running on Windows XP and the version of ruby I'm using is:
ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]
I feel like there is something very obvious I am missing here. Any ideas?
EDIT: Learned a valuable lesson about assumptions today - specifically assuming your editor IS using UTF-8 without actually checking it. Oops!
Thanks for the quick and accurate replies all!
EDIT AGAIN: The 'setting up vim properly for utf-8' grew too big and wasn't really relevant to this question, so it is now a separate question.
发布评论
评论(2)
鉴于 Ruby 明确地提醒您注意 UTF-8,我强烈怀疑您实际上并没有编写出 UTF-8 文件。确保 Vim(或您用来创建文件的任何文本编辑器)确实设置为输出 UTF-8。
请注意,在 UTF-8 中,任何非 ASCII 字符都将由多个字节表示,而不是您在 Vim 诊断中描述的单个字节。我建议使用二进制文件编辑器(或转储,或其他)来真正显示文本文件中的内容。有些东西还没有一些先入为主的编码概念——甚至没有试图将其视为文本文件。
记事本允许您以 UTF-8 格式写出文件,因此您可能想尝试一下,看看会发生什么。 (我自己没有安装 Ruby,否则我会为你尝试一下。)
Given that Ruby is explicitly calling your attention to UTF-8, I strongly suspect that you haven't actually written out a UTF-8 file to start with. Make sure that Vim (or whatever text editor you're using to create the file) is really set to write out UTF-8.
Note that in UTF-8, any non-ASCII character will be represented by multiple bytes, not a single byte as you've described from the Vim diagnostics. I'd recommend using a binary file editor (or dump, or whatever) to really show what's in the text file though. Something that doesn't already have some preconceived notion of the encoding - something that isn't even trying to think of it as a text file.
Notepad lets you write out a file in UTF-8, so you might want to try that just to see what happens. (I don't have Ruby installed myself, otherwise I'd try it for you.)
您的文件采用 latin1 格式。鲁比是对的。
emdash 将使用两个字节进行编码,而不是 UTF-8 中的一个字节。
Your file is in latin1. Ruby is right.
emdash would be encoded on two bytes not one in UTF-8.