文件编码有多重要?
文件编码有多重要? Notepad++ 的默认值是 ANSI,但使用 UTF-8 会更好吗?如果不使用其中之一会出现什么问题?
How important is file encoding? The default for Notepad++ is ANSI, but would it be better to use UTF-8 or what problems could occur if not using one or the other?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,如果每个人都对所有文档都使用 UTF-8,那就更好了。
不幸的是,它们没有,主要是因为 Windows 文本编辑器(以及许多其他 Win 工具)默认为“ANSI”。这是一个误导性的名称,因为它与 ANSI X3.4(又名 ASCII)或任何其他 ANSI 标准无关,但实际上意味着当前 Windows 计算机的系统默认代码页。该默认代码页可以在机器之间或同一台机器上更改,此时所有包含非 ASCII 字符(如重音字母)的“ANSI”文本文件都将被破坏。
因此,您当然应该以 UTF-8 格式创建新文件,但您必须注意,其他人提供给您的文本文件可能位于各种蹩脚的国家/地区特定代码页集合中。
微软的立场是,想要 Unicode 支持的用户应该使用 UTF-16LE 文件;它甚至误导性地在保存框编码菜单中将此编码简称为“Unicode”。 MS 之所以采用这种方法,是因为在 Unicode 的早期,人们相信这是最简洁的方法。从那时起:
Unicode 扩展到 16 位代码点之外,消除了 UTF-16 每个代码单元都是一个代码点的优势;
UTF-8 的发明,其优点是除了覆盖所有 Unicode 之外,它还向后兼容 7 位 ASCII(UTF-16 则不然,因为它充满了零字节),因此它是通常也更紧凑。
因此,世界上的大多数其他地方(Mac、Linux、一般的网络)已经转向 UTF-8 作为标准编码,在文件存储或网络用途中避开 UTF-16。不幸的是,Windows 仍然停留在 Windows NT 早期所采用的陈旧且无用的不兼容代码页选择。没有迹象表明这种情况在不久的将来会发生改变。
Yes, it would be better if everyone used UTF-8 for all documents always.
Unfortunately, they don't, primarily because Windows text editors (and many other Win tools) default to “ANSI”. This is a misleading name as it is nothing to do with ANSI X3.4 (aka ASCII) or any other ANSI standard, but in fact means the system default code page of the current Windows machine. That default code page can change between machines, or on the same machine, at which point all text files in “ANSI” that have non-ASCII characters like accented letters in will break.
So you should certainly create new files in UTF-8, but you will have to be aware that text files other people give you are likely to be in a motley collection of crappy country-specific code pages.
Microsoft's position has been that users who want Unicode support should use UTF-16LE files; it even, misleadingly, calls this encoding simply “Unicode” in save box encoding menus. MS took this approach because in the early days of Unicode it was believed that this would be the cleanest way of doing it. Since that time:
Unicode was expanded beyond 16-bit code points, removing UTF-16's advantage of each code unit being a code point;
UTF-8 was invented, with the advantage that as well as covering all of Unicode, it's backwards-compatible with 7-bit ASCII (which UTF-16 isn't as it's full of zero bytes) and for this reason it's also typically more compact.
Most of the rest of the world (Mac, Linux, the web in general) has, accordingly, already moved to UTF-8 as a standard encoding, eschewing UTF-16 for file storage or network purposes. Unfortunately Windows remains stuck with the archaic and useless selection of incompatible code pages it had back in the early Windows NT days. There is no sign of this changing in the near future.
如果您在使用不同默认编码的系统之间共享文件,那么 Unicode 编码是最佳选择。如果你不打算这样做,或者只使用 ASCII 字符集,并且不打算使用编码,无论出于何种原因,都会修改这些字符(我目前想不出任何字符,但你永远不知道...),你并不真正需要它。
顺便说一句,当您在系统上对包含非 ASCII 字符的文件不使用 Unicode 编码时,系统上的编码与创建文件时使用的编码不同,就会发生这种情况: http://en.wikipedia.org/wiki/Mojibake
If you're sharing files between systems that use differing default encodings, then a Unicode encoding is the way to go. If you don't plan on it, or use only the ASCII set of characters and aren't going to work with encodings that, for whatever reason, modify those (I can't think of any at the moment, but you never know...), you don't really need it.
As an aside, this is the sort of stuff that happens when you don't use a Unicode encoding for files with non-ASCII characters on a system with a different encoding from the one the file was created with: http://en.wikipedia.org/wiki/Mojibake
这非常重要,因为如果您使用错误的编码,您的无论什么工具都会显示错误的字符/无论什么。尝试在记事本中加载一个 kyrillic 文件而不使用 UTF-8 左右,并看到很多“?”即将到来。 :)
It is very importaint since your whatevertool will show false chars/whatever if you use the wrong encoding. Try to load a kyrillic file in Notepad without using UTF-8 or so and see a lot of "?" coming up. :)