结束标签是否应该关闭所有未封闭的中间开始标签并省略结束标签?

发布于 2024-12-25 18:32:18 字数 1527 浏览 5 评论 0原文

是我读错了 HTML 4.01 标准,还是 Google 读错了?在 HTML 4.01 中,如果我写:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<html> <head> <body>plain <em>+em <strong>+strong </em>-em

Google Chrome 中的渲染为:

普通 +em + -em

这似乎与 HTML 4.01 标准相矛盾,该标准总结了底层 SGML 规则as: “一个结束标签结束,回到匹配的开始标签,所有未结束的中间开始标签省略了结束标签”。¹

也就是说, 结束标记不仅应该封闭 开始标记,还应该封闭未封闭的中间标记 开始标签,渲染应该是:

普通 +em + -em

评论者指出,将标签保持打开状态是不好的做法,但这只是一个学术示例。一个同样好的例子是:+em +强 -em。根据我对 HTML 4.01 标准的理解,由于元素重叠,此代码片段将无法按预期工作: 结束标记应隐式关闭 。它确实按预期工作这一事实令人惊讶,这就是我提出问题的原因。

结果我在这个问题中提出了错误的二分法:Google 和我对 HTML 4.01 标准的理解都没有错误。 w3.org 的一位私人通讯员向我介绍了 Martin Bryan 的Web SGML 和 HTML 4.0 解释,其中解释说“解析程序将自动关闭当前打开的任何内容”当遇到更高级别元素的结束标记时,嵌入元素已被声明为具有可省略的结束标记。 (如果结束标记不能省略的嵌入元素仍然打开,程序将报告编码错误。)”²(强调已添加。)Bryan 对 SGML 标准的总结是正确的,HTML 4.01 的总结是错误的。

Am I reading the HTML 4.01 standard wrong, or is Google? In HTML 4.01, if I write:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<html> <head> <body>plain <em>+em <strong>+strong </em>-em

The rendering in Google Chrome is:

plain +em +strong -em

This seems to contradict the HTML 4.01 standard, which summarizes the underlying SGML rules as: “an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags”.¹

That is, the </em> end tag should close not only the <em> start tag but also the unclosed intervening <strong> start tag, and the rendering should be:

plain +em +strong -em

A commenter pointed out that it is bad practice to leave tags open, but this is only an academic example. An equally good example would be: <em> +em <strong> +strong </em> -em </strong>. It was my understanding from the HTML 4.01 standard that this code fragment would not work as intended because of the overlapping elements: the </em> end tag should implicitly close the <strong>. The fact that it did work as intended was surprising, and this is what led to my question.

And it turned out I proposed a false dichotomy in the question: neither Google nor I were reading the HTML 4.01 standard wrong. A private correspondent at w3.org pointed me to Web SGML and HTML 4.0 Explained by Martin Bryan, which explains that “[t]he parsing program will automatically close any currently open embedded element which has been declared as having omissible end-tags when it encounters an end-tag for a higher level element. (If an embedded element whose end-tag cannot be omitted is still open, however, the program will report an error in the coding.)”² (Emphasis added.) Bryan’s summarization of the SGML standard is right, and HTML 4.01’s summarization is wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

小霸王臭丫头 2025-01-01 18:32:18

引用自 HTML 4.01 规范的声明非常晦涩,或者从所有角度来看都是完全错误的。 HTML 4.01 对于结束标记省略有特定的规则,这些规则取决于元素。例如,p 元素的结束标签可以被省略,em 元素的结束标签可能永远不会被省略。规范中的声明可能试图说结束标记隐式关闭任何尚未关闭的内部元素,在允许省略结束标记的范围内

没有浏览器按照定义实现 HTML 4.01(或任何早期的 HTML 规范),并且 SGML 功能正式成为其中的一部分。 HTML 规范中关于 SGML 的任何内容都应该被视为理论上的,除非另有证明。

HTML5 在这方面并没有改变游戏规则,只是它写下了 错误处理规则。在诸如此类的简单问题中,规则只是使传统浏览器行为成为一种规范。它们是面向 tagoup 的,将某些标签或多或少地视为格式化命令: 表示“斜体”, > 意味着“停止斜体”等。但是 HTML5 还采取措施更正式地定义错误处理,以便尽管使用了这样的标签汤,但可以明确定义将在 DOM 中构建什么文档树。

The statement quoted from the HTML 4.01 specification is very obscure, or just plain wrong on all accounts. HTML 4.01 has specific rules for end tag omission, and these rules depend on the element. For example, the end tag of a p element may be omitted, the end tag of an em may never be omitted. The statement in the specification probably tries to say that an end tag implicitly closes any inner elements that have not yet been closed, to the extent that end tag omission is allowed.

No browser has ever implement HTML 4.01 (or any earlier HTML specification) as defined, with the SGML features that are formally part of it. Anything that the HTML specifications say about SGML should be taken as just theoretical until proven otherwise.

HTML5 doesn’t change the rules of the game in this respect, except that it writes down the error handling rules. In simple issues like these, the rules just make the traditional browser behavior a norm. They are tagsoup-oriented, treating certain tags more or less as formatting commands: <em> means “italicize,” </em> means “stop italicizing,” etc. But HTML5 also takes measures to define error handling more formally so that despite such tag soup usage, it is well-defined what document tree in the DOM will be constructed.

最好是你 2025-01-01 18:32:18

有些标签允许省略(例如

的结束标签或 的开始和结束标签),有些则不允许(例如 的结束标记)。您引用的规范部分所指的是前者。您可以通过在 DTD 中使用破折号来识别它们

<!ELEMENT P - O (%inline;)*            -- paragraph -->
  ^A p element
            ^ requires a start tag
              ^ has optional end tag
                 ^ contains zero or more inline things
                                       ^ Comment: Is a paragraph

您所拥有的不是带有省略标签的 HTML 文档,而是浏览器将尝试对其执行错误恢复的无效伪 HTML 文档。

该规范(针对 HTML 4)没有描述如何执行错误恢复,这由浏览器决定。

Some tags are allowed to be omitted (such as the end tag for <p> or the start and end tags for <body>), and some are not (such as the end tag for <strong>). It is the former that the section of the spec you quote is referring to. You can identify them by the use of a dash in the DTD:

<!ELEMENT P - O (%inline;)*            -- paragraph -->
  ^A p element
            ^ requires a start tag
              ^ has optional end tag
                 ^ contains zero or more inline things
                                       ^ Comment: Is a paragraph

What you have is not an HTML document with an omitted tag, but and invalid pseudo-HTML document that browsers will try to perform error recovery on.

The specification (for HTML 4) does not describe how to perform error recovery, that is left up to browsers.

七月上 2025-01-01 18:32:18

该规范规定:

某些 HTML 元素类型允许作者省略结束标记(例如,P 和 LI 元素类型)。

请查阅 SGML 标准以获取有关管理元素的规则的信息(例如,它们必须正确嵌套、结束标记闭合、返回到匹配的开始标记、所有未闭合的中间开始标记以及省略的结束标记(第 7.5.1 节)等)。

适用于可以具有省略结束标签。

如果您查看 P 元素规范,您会请参阅:

开始标记:必需,结束标记:可选

因此,当您使用此标记时:

<DIV>
<P>This is the paragraph.
</DIV>

P 元素将自动关闭。

但是,如果您查看 EM 规范,您将看到:

开始标记:必需,结束标记:必需

因此,此自动关闭规则无效,因为 HTML 无效。

奇怪的是,所有浏览器都对这种无效 HTML 表现出相同的行为。

The specification says that:

Some HTML element types allow authors to omit end tags (e.g., the P and LI element types).

This:

Please consult the SGML standard for information about rules governing elements (e.g., they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).

Applies to elements which can have omitted end tags.

If you look the P element spec you will see:

Start tag: required, End tag: optional

So, when you use this:

<DIV>
<P>This is the paragraph.
</DIV>

The P element will be automatically closed.

But, if you look at the EM spec, you will see:

Start tag: required, End tag: required

So this rule of automatic closing is not valid since the HTML is not valid.

Curiously all the browsers presented the same behavior with that kind of invalid HTML.

假扮的天使 2025-01-01 18:32:18

所有现代浏览器都使用 HTML5 解析器(即使对于 HTML 4.01 内容),因此 HTML5 的解析规则适用。您可以在解析 HTML 文档部分找到更多信息HTML5 规范

HTML 大纲

  • HTML
    • 头部
      • #text " " ()
    • 身体
      • #text“普通”()
      • EM
        • #text“+em”(斜体)
          • #text“+strong”(粗体/斜体)
        • #text“-em”(粗体)

All modern browsers use an HTML5 parser (even for HTML 4.01 content), so the parsing rules of HTML5 apply. You can find more information at the Parsing HTML Documents section in the HTML5 spec.

HTML Outline

  • HTML
    • HEAD
      • #text " " ()
    • BODY
      • #text "plain " ()
      • EM
        • #text "+em " (italic)
        • STRONG
          • #text "+strong " (bold/italic)
      • STRONG
        • #text "-em" (bold)
尤怨 2025-01-01 18:32:18

如果您尝试通过 http://validator.w3.org/check 运行 HTML,它将标记此HTML 几乎是无效的。

如果您的 HTML 无效,那么一切都将失败,并且不同的浏览器可能会以不同的方式呈现您的 HTML。

If you try running your HTML through http://validator.w3.org/check it will flag up this HTML as being pretty much invalid.

If your HTML is invalid, all bets are off, and different browsers may render your HTML differently.

古镇旧梦 2025-01-01 18:32:18

如果你通过右键单击并说检查元素来查看 Chrome 中的 DOM,你将能够推断出,由于你的标签不匹配,它应用了一种算法来决定你搞砸的地方。从技术上讲,它确实在正确的位置关闭了强标签。然而,它认为您可能试图将这两段文本设为粗体,因此它将最后一个 -em 放在一个全新的、额外的“strong”元素中,同时将“+strong”保留在它自己的“strong”元素中。在我看来,chrome 团队认为从统计数据来看,你可能希望这两件事都是大胆的。

If you look at the D.O.M. in Chrome by right clicking and saying inspect element, you'll be able to deduce that since your tags do not match up, it applied an algorithm to decide where you messed up. Technically, it does close the strong tag at the correct place. However, It decides that you were probably trying to make both pieces of text bold, so it puts the last -em in an entirely new, extra "strong" element while keeping the '+strong' in it's own "strong" element. It looks to me like the chrome team decided it is statistically likely that you want both things to be bold.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文