将 HTML 编码从 UTF-8 切换到 UTF-16 时可能会出现什么问题?
对于 HTML 编码从 UTF-8 更改为 UTF-16 有何影响? 我想知道您对这个问题的想法。 在做出这样的改变之前我需要考虑一些事情吗?
注意:由于我需要处理大量的日文和中文文本,所以很感兴趣。
What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before making such a change?
Note: Interested due to enormous amounts of japanese and chinese text I need to handle.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
为什么您想要更改为 UTF-16?
Why do you want to change to UTF-16?
我可以想到一些会出错的事情:
I can think of a few things that will go wrong:
据我所知所有现代浏览器都支持UTF-16编码。 但正如其他人指出的那样,您应该显式声明编码。 并非所有浏览器和平台都支持所有 unicode 字符,但我认为这与您使用哪种编码无关。
但是,如果带宽是一个大问题,您可能应该考虑对 HTML 进行 gzip 压缩。 这将比切换编码节省更多的带宽。
As far as I know all modern browsers support UTF-16 encoding. But as others have pointed out, you should declare the encoding explicitly. Not all browsers and platforms will support all unicode characters, but I think this is regardless of which encoding you use.
However, if bandwith is a big issue you should probably consider gzipping the HTML. This will save much more bandwidth than switching encoding.
对于 8 位以上的数据,字节顺序也是一个问题。 UTF 编码的文件以字节顺序标记开头,该标记用于确定该文件的字节顺序或字节顺序。
维基百科对此有很好的解释。
There is also the byte order which becomes an issue with anything above 8-bit data. UTF encoded files begin with a byte order mark which is used to determine the byte order, or endianness, of that file.
Wikipedia has a quite good explanation of this.
你在这里发表的文章非常好。 基础知识指出,“当需要唯一的字符编码时,字符编码必须是 UTF-8、UTF-16 或 UTF-32。US-ASCII 与 UTF-8 向上兼容(US-ASCII 字符串也是 UTF -8 字符串,请参阅 [RFC 3629]),因此如果需要与 US-ASCII 兼容,则 UTF-8 是合适的。” 实际上,与 US-ASCII 的兼容性非常有用,几乎成为一项要求。 W3C 明智地解释道:“在其他情况下,例如 API,UTF-16 或 UTF-32 可能更合适。选择其中之一的可能原因包括内部处理的效率以及与其他进程的互操作性。”
Very nice article you have held here. Fundamentals states, "When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 3629]), and UTF-8 is therefore appropriate if compatibility with US-ASCII is desired." In practice, compatibility with US-ASCII is so useful it's almost a requirement. The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes."
我怀疑大多数浏览器甚至不会显示您的页面。
I suspect most browsers won't even show your pages.