考虑到UTF-8是唯一有效的编码,为什么需要指定字符编码?
来自:
编码标准需要使用UTF-8字符编码,并且需要使用“ UTF-8”编码标签来识别它。这些要求必须使用文档的字符编码声明(如果存在)指定使用“ UTF-8”的ASCII案例不敏感的匹配的编码标签。无论是否存在编码声明的字符,用于编码文档的实际字符必须为UTF-8。
(…)
如果HTML文档不是从BOM开头的,并且其编码不是由Content-Type Metadata明确给出的,并且该文档不是IFRAME SRCDOC文档,则必须使用具有charset的元元素指定编码属性或具有HTTP-Equiv属性的元元素在编码声明状态中。
注意。即使所有字符都在ASCII范围内,也需要一个字符编码声明(在内容型元数据中或在文件中明确表示),因为需要编码字符来处理用户在表单中输入的非ASCII字符,在脚本生成的URL等等。
因此,我的理解是:
- 只有一个允许的编码,即UTF-8。
- 但是,仍然必须明确指定编码。
为什么?
如果编码必须始终是UTF-8,则不是冗余以指定编码吗?
前提是该文档指定它是在html 5中编写的(例如,如果它声明<!doctype html>
,而不是<! w3c // dtd html 4.01 transitional // en“” http://www.w.org/tr/tr/html4/loose.dtd“>
在我看来,在我看来,字符编码可以是可选的。如果未指定UTF-8?
From the HTML Standard § 4.2.5.4 Specifying the document's character encoding:
The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8.
(…)
If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state.
Note. A character encoding declaration is required (either in the Content-Type metadata or explicitly in the file) even when all characters are in the ASCII range, because a character encoding is needed to process non-ASCII characters entered by the user in forms, in URLs generated by scripts, and so forth.
So my understanding is that:
- There is only one allowed encoding, namely UTF-8.
- Nonetheless the encoding must still be explicitly specified.
Why?
Isn't this redundant to specify the encoding if the encoding must always be UTF-8?
Provided that the document specifies it is written in HTML 5 (eg if it declares <!DOCTYPE html>
as opposed to, say, something like <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
it seems to me that character encoding could be optional with user agents defaulting to UTF-8 if not specified?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论