将 HTML 编码从 UTF-8 切换到 UTF-16 时可能会出现什么问题?

发布于 2024-07-19 20:49:12 字数 121 浏览 5 评论 0原文

对于 HTML 编码从 UTF-8 更改为 UTF-16 有何影响? 我想知道您对这个问题的想法。 在做出这样的改变之前我需要考虑一些事情吗?

注意:由于我需要处理大量的日文和中文文本,所以很感兴趣。

What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before making such a change?

Note: Interested due to enormous amounts of japanese and chinese text I need to handle.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

肩上的翅膀 2024-07-26 20:49:12
  • 假设您的大部分 HTML 都是 ASCII,您的带宽消耗可能会增加近一倍。
  • 错误地假定 UTF-8(或 ASCII)的客户端会感到困惑。

为什么您想要更改为 UTF-16?

  • Your bandwidth consumption is likely to nearly double, assuming most of your HTML is ASCII
  • Clients which incorrectly assume UTF-8 (or ASCII) will be confused

Why do you want to change to UTF-16?

水染的天色ゝ 2024-07-26 20:49:12

我可以想到一些会出错的事情:

  1. 您必须在 HTTP 标头中指定它是 UTF-16。 与 UTF-8 不同,UTF-16 与 ASCII 兼容,这意味着一切从一开始就必须采用 UTF-16。
  2. 较旧的客户端不支持 UTF-16。 例如,Windows 9x 上的任何内容。 可能还有 Mac OS9。
  3. 哦,等等,我差点忘了:北美和欧洲的 Windows XP 版本默认没有安装亚洲字体。

I can think of a few things that will go wrong:

  1. You MUST specify that it's UTF-16 in the HTTP header. Unlike UTF-8, UTF-16 is not ASCII compatible, which means that everything needs to be in UTF-16 from the start.
  2. Older clients don't support UTF-16. For example, anything on Windows 9x. Possibly Mac OS9 as well.
  3. Oh, wait, I almost forgot: North America and European copies of Windows XP don't have Asian fonts installed by default.
浅笑依然 2024-07-26 20:49:12

据我所知所有现代浏览器都支持UTF-16编码。 但正如其他人指出的那样,您应该显式声明编码。 并非所有浏览器和平台都支持所有 unicode 字符,但我认为这与您使用哪种编码无关。

但是,如果带宽是一个大问题,您可能应该考虑对 HTML 进行 gzip 压缩。 这将比切换编码节省更多的带宽。

As far as I know all modern browsers support UTF-16 encoding. But as others have pointed out, you should declare the encoding explicitly. Not all browsers and platforms will support all unicode characters, but I think this is regardless of which encoding you use.

However, if bandwith is a big issue you should probably consider gzipping the HTML. This will save much more bandwidth than switching encoding.

绝對不後悔。 2024-07-26 20:49:12

对于 8 位以上的数据,字节顺序也是一个问题。 UTF 编码的文件以字节顺序标记开头,该标记用于确定该文件的字节顺序或字节顺序。

维基百科对此有很好的解释。

There is also the byte order which becomes an issue with anything above 8-bit data. UTF encoded files begin with a byte order mark which is used to determine the byte order, or endianness, of that file.

Wikipedia has a quite good explanation of this.

撩发小公举 2024-07-26 20:49:12

你在这里发表的文章非常好。 基础知识指出,“当需要唯一的字符编码时,字符编码必须是 UTF-8、UTF-16 或 UTF-32。US-ASCII 与 UTF-8 向上兼容(US-ASCII 字符串也是 UTF -8 字符串,请参阅 [RFC 3629]),因此如果需要与 US-ASCII 兼容,则 UTF-8 是合适的。” 实际上,与 US-ASCII 的兼容性非常有用,几乎成为一项要求。 W3C 明智地解释道:“在其他情况下,例如 API,UTF-16 或 UTF-32 可能更合适。选择其中之一的可能原因包括内部处理的效率以及与其他进程的互操作性。”

Very nice article you have held here. Fundamentals states, "When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 3629]), and UTF-8 is therefore appropriate if compatibility with US-ASCII is desired." In practice, compatibility with US-ASCII is so useful it's almost a requirement. The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes."

停顿的约定 2024-07-26 20:49:12

我怀疑大多数浏览器甚至不会显示您的页面。

I suspect most browsers won't even show your pages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文