浏览器如何处理指定字符编码的标签?

发布于 2024-10-27 05:25:22 字数 238 浏览 2 评论 0原文

假设浏览器遇到指定字符编码的 标记,如下所示:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

它是否从头开始再次解析页面,因为 中的一些前面的字符 部分可能被错误地解释?或者是否有其他一些限制可以防止前面的字符被错误解释?

Suppose a browser encounters a <meta> tag that specifies the character-encoding, like this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

Does it start over from the beginning parsing the page again, since some of the preceding characters in the <head> section may have been interpreted incorrectly? Or are there some other constraints that prevent prior characters from being interpreted incorrectly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

抱着落日 2024-11-03 05:25:22

据我所知,浏览器在 中找到字符集声明后不会返回,并且它们假定到目前为止是 ASCII 兼容的字符集。不幸的是我找不到参考资料来证实这一点。

如果服务器已经提供了 Content-Type HTTP 标头,则确认浏览器将忽略 Content-Type 元元素,因此您无法使用 覆盖“错误”的服务器端字符集元素。

字符集声明的要点是针对不由 HTTP 服务器提供服务的 HTML 文档。

这意味着您不应依赖 HTML 文件中的 字符集声明,而应配置 HTTP 服务器以提供正确的字符集。如果由于某种原因您必须依赖 字符集声明,那么您应该只使用 ASCII 字符直到该点并将其放置在 的早期位置。 code> 尽可能,最好作为第一个元素。

As far as I know, browsers wont go back after finding a charset declaration in the <head> and they assume a ASCII compatible charset up to that point. Unfortunately I can't find a reference to confirm this.

Confirming browsers will ignore a Content-Type meta element, if the server already provides a Content-Type HTTP header, so you can't override a "wrong" server-side charset with a <meta> element.

The point for the <meta> charset declaration is for HTML documents that are not server by a HTTP server.

That means you shouldn't rely on a <meta> charset declaration in the HTML file, but configure your HTTP server to provide the correct charset. If for some reason you have to rely on a <meta> charset declaration, you should only have ASCII characters up to that point and position it as early in the <head> as possible, preferably as the first element.

情绪 2024-11-03 05:25:22

在某些情况下,解析器可以重新开始。相关规范在这里: http://dev.w3.org /html5/spec/parsing.html#change-the-encoding

请注意,传统上浏览器可能不完全遵循此算法;很可能他们所做的事情都略有不同。然而,上面的链接描述了兼容 HTML5 的浏览器应该做什么。所描述的算法可能是各种浏览器先前行为的混合体。

由于 HTML5 仍然是一个工作草案,因此应该被认为可能会发生变化。

The parser can start over in some circumstances. The relevant spec is here: http://dev.w3.org/html5/spec/parsing.html#change-the-encoding

Note that browsers traditionally have probably not followed this algorithm exactly; chances are they've all done slightly different things. However, the link above describes what HTML5 compliant browsers should do. The algorithm described is likely an amalgam of various browsers previous behaviour.

Since HTML5 is still a working draft, this should be considered subject to change.

╄→承喏 2024-11-03 05:25:22

它对节点结构没有实际影响。只有文本节点(和属性节点)的内容需要进行转码。

如果您的服务器发送

Content-type: text/html;charset=utf-8

... 标头,浏览器将从一开始就知道正确的字符集。您可以使用包含以下内容的 .htaccess 文件来实现这一点:

AddDefaultCharset utf-8

It has no real effect on the node structure. Only the content of text nodes (and attribute nodes) has to be transcoded.

If your server sends the

Content-type: text/html;charset=utf-8

...header the browser will know the right charset from the start. You can acieve ths with a .htaccess file containing:

AddDefaultCharset utf-8
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文