Ruby 1.9 - 无效的多字节字符 (utf-8)

发布于 2024-10-27 16:10:44 字数 879 浏览 2 评论 0 原文

我有一个只包含这两行的 ruby​​ 文件:

# encoding: utf-8
puts "—"

当我使用 ruby test_enc.rb 运行它时,它会失败:

test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file

我不知道如何正确指定 - 的字符代码code> (emdash),但 vim 告诉我它是 151,十六进制 97,八进制 227。对于像 ã 这样的其他字符,它也会以同样的方式失败,所以我怀疑它是否与该字符特别相关。 我在 Windows XP 上运行,我使用的 ruby​​ 版本是:

ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]

我觉得这里缺少一些非常明显的东西。有什么想法吗?

编辑:今天学到了关于假设的宝贵教训 - 特别是假设您的编辑器正在使用 UTF-8 而没有实际检查它。哎呀!

感谢大家快速而准确的回复!

再次编辑:“为utf-8正确设置vim”变得太大,与这个问题并不真正相关,所以它现在是一个单独的问题

I have a ruby file with only these two lines:

# encoding: utf-8
puts "—"

When I run it with ruby test_enc.rb it fails with:

test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file

I don't know how to properly specify the character code of (emdash), but vim tells me it is 151, Hex 97, Octal 227. It fails the same way with other characters like ã as well, so I doubt it is related specifically to that character.
I am running on Windows XP and the version of ruby I'm using is:

ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]

I feel like there is something very obvious I am missing here. Any ideas?

EDIT: Learned a valuable lesson about assumptions today - specifically assuming your editor IS using UTF-8 without actually checking it. Oops!

Thanks for the quick and accurate replies all!

EDIT AGAIN: The 'setting up vim properly for utf-8' grew too big and wasn't really relevant to this question, so it is now a separate question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

请别遗忘我 2024-11-03 16:10:45

鉴于 Ruby 明确地提醒您注意 UTF-8,我强烈怀疑您实际上并没有编写出 UTF-8 文件。确保 Vim(或您用来创建文件的任何文本编辑器)确实设置为输出 UTF-8。

请注意,在 UTF-8 中,任何非 ASCII 字符都将由多个字节表示,而不是您在 Vim 诊断中描述的单个字节。我建议使用二进制文件编辑器(或转储,或其他)来真正显示文本文件中的内容。有些东西还没有一些先入为主的编码概念——甚至没有试图将其视为文本文件。

记事本允许您以 UTF-8 格式写出文件,因此您可能想尝试一下,看看会发生什么。 (我自己没有安装 Ruby,否则我会为你尝试一下。)

Given that Ruby is explicitly calling your attention to UTF-8, I strongly suspect that you haven't actually written out a UTF-8 file to start with. Make sure that Vim (or whatever text editor you're using to create the file) is really set to write out UTF-8.

Note that in UTF-8, any non-ASCII character will be represented by multiple bytes, not a single byte as you've described from the Vim diagnostics. I'd recommend using a binary file editor (or dump, or whatever) to really show what's in the text file though. Something that doesn't already have some preconceived notion of the encoding - something that isn't even trying to think of it as a text file.

Notepad lets you write out a file in UTF-8, so you might want to try that just to see what happens. (I don't have Ruby installed myself, otherwise I'd try it for you.)

困倦 2024-11-03 16:10:45

您的文件采用 latin1 格式。鲁比是对的。

emdash 将使用两个字节进行编码,而不是 UTF-8 中的一个字节。

Your file is in latin1. Ruby is right.

emdash would be encoded on two bytes not one in UTF-8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文