考虑到UTF-8是唯一有效的编码,为什么需要指定字符编码?

发布于 2025-02-05 07:24:51 字数 925 浏览 2 评论 0原文

来自

编码标准需要使用UTF-8字符编码,并且需要使用“ UTF-8”编码标签来识别它。这些要求必须使用文档的字符编码声明(如果存在)指定使用“ UTF-8”的ASCII案例不敏感的匹配的编码标签。无论是否存在编码声明的字符,用于编码文档的实际字符必须为UTF-8。

(…)

如果HTML文档不是从BOM开头的,并且其编码不是由Content-Type Metadata明确给出的,并且该文档不是IFRAME SRCDOC文档,则必须使用具有charset的元元素指定编码属性或具有HTTP-Equiv属性的元元素在编码声明状态中。

注意。即使所有字符都在ASCII范围内,也需要一个字符编码声明(在内容型元数据中或在文件中明确表示),因为需要编码字符来处理用户在表单中输入的非ASCII字符,在脚本生成的URL等等。

因此,我的理解是:

  • 只有一个允许的编码,即UTF-8。
  • 但是,仍然必须明确指定编码。

为什么?

如果编码必须始终是UTF-8,则不是冗余以指定编码吗?

前提是该文档指定它是在html 5中编写的(例如,如果它声明<!doctype html>,而不是<! w3c // dtd html 4.01 transitional // en“” http://www.w.org/tr/tr/html4/loose.dtd“>在我看来,在我看来,字符编码可以是可选的。如果未指定UTF-8?

From the HTML Standard § 4.2.5.4 Specifying the document's character encoding:

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8.

(…)

If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state.

Note. A character encoding declaration is required (either in the Content-Type metadata or explicitly in the file) even when all characters are in the ASCII range, because a character encoding is needed to process non-ASCII characters entered by the user in forms, in URLs generated by scripts, and so forth.

So my understanding is that:

  • There is only one allowed encoding, namely UTF-8.
  • Nonetheless the encoding must still be explicitly specified.

Why?

Isn't this redundant to specify the encoding if the encoding must always be UTF-8?

Provided that the document specifies it is written in HTML 5 (eg if it declares <!DOCTYPE html> as opposed to, say, something like <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> it seems to me that character encoding could be optional with user agents defaulting to UTF-8 if not specified?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文