浏览器如何处理指定字符编码的标签?
假设浏览器遇到指定字符编码的 标记,如下所示:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
它是否从头开始再次解析页面,因为 中的一些前面的字符 部分可能被错误地解释?或者是否有其他一些限制可以防止前面的字符被错误解释?
Suppose a browser encounters a <meta>
tag that specifies the character-encoding, like this:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Does it start over from the beginning parsing the page again, since some of the preceding characters in the <head>
section may have been interpreted incorrectly? Or are there some other constraints that prevent prior characters from being interpreted incorrectly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
据我所知,浏览器在
中找到字符集声明后不会返回,并且它们假定到目前为止是 ASCII 兼容的字符集。不幸的是我找不到参考资料来证实这一点。
如果服务器已经提供了 Content-Type HTTP 标头,则确认浏览器将忽略 Content-Type 元元素,因此您无法使用
覆盖“错误”的服务器端字符集元素。
字符集声明的要点是针对不由 HTTP 服务器提供服务的 HTML 文档。
这意味着您不应依赖 HTML 文件中的
字符集声明,而应配置 HTTP 服务器以提供正确的字符集。如果由于某种原因您必须依赖
字符集声明,那么您应该只使用 ASCII 字符直到该点并将其放置在
的早期位置。 code> 尽可能,最好作为第一个元素。
As far as I know, browsers wont go back after finding a charset declaration in the
<head>
and they assume a ASCII compatible charset up to that point. Unfortunately I can't find a reference to confirm this.Confirming browsers will ignore a Content-Type meta element, if the server already provides a Content-Type HTTP header, so you can't override a "wrong" server-side charset with a
<meta>
element.The point for the
<meta>
charset declaration is for HTML documents that are not server by a HTTP server.That means you shouldn't rely on a
<meta>
charset declaration in the HTML file, but configure your HTTP server to provide the correct charset. If for some reason you have to rely on a<meta>
charset declaration, you should only have ASCII characters up to that point and position it as early in the<head>
as possible, preferably as the first element.在某些情况下,解析器可以重新开始。相关规范在这里: http://dev.w3.org /html5/spec/parsing.html#change-the-encoding
请注意,传统上浏览器可能不完全遵循此算法;很可能他们所做的事情都略有不同。然而,上面的链接描述了兼容 HTML5 的浏览器应该做什么。所描述的算法可能是各种浏览器先前行为的混合体。
由于 HTML5 仍然是一个工作草案,因此应该被认为可能会发生变化。
The parser can start over in some circumstances. The relevant spec is here: http://dev.w3.org/html5/spec/parsing.html#change-the-encoding
Note that browsers traditionally have probably not followed this algorithm exactly; chances are they've all done slightly different things. However, the link above describes what HTML5 compliant browsers should do. The algorithm described is likely an amalgam of various browsers previous behaviour.
Since HTML5 is still a working draft, this should be considered subject to change.
它对节点结构没有实际影响。只有文本节点(和属性节点)的内容需要进行转码。
如果您的服务器发送
... 标头,浏览器将从一开始就知道正确的字符集。您可以使用包含以下内容的 .htaccess 文件来实现这一点:
It has no real effect on the node structure. Only the content of text nodes (and attribute nodes) has to be transcoded.
If your server sends the
...header the browser will know the right charset from the start. You can acieve ths with a .htaccess file containing: