HTML 空白规则的定义?
我正在寻找这个定义,以使我的 HTML 渲染器更加符合要求。目前,它正在猜测要保留哪些空白、要折叠哪些空白以及要扔掉哪些空白。 SGML 标准很难找到,而且 HTML 标准似乎没有按照我的需要来处理该主题所需的深度。
目前,我的渲染器将 HTML 解析为树,然后执行递归布局传递来定位所有元素及其内容。我正在尝试在解析阶段抛出一些空格,即在某些情况下不发出仅空格的文本块。这对大多数情况都有效,但也有一些边缘情况很难处理。
(我还在研究 HTML 控件的编辑器子类,事实证明布局时间解决方案在编辑器中有点问题,因此我致力于将它们进入解析阶段。布局信息在回流之前不可用时间,即编辑文档后的一段时间。)
用链接/火焰来消除。
I'm looking for this definition to make my HTML renderer conform a bit better. Currently it's guessing which whitespace to keep, which to collapse and what to throw. The SGML standard is hard to find and the HTML standard doesn't seem to treat the subject with the required depth for my needs.
Currently my renderer parses the HTML into a tree and then does a recursive layout pass to position all the elements and their content. I'm experimenting with throwing some whitespace out in the parse stage, i.e. not emitting whitespace only text chunks in certain circumstances. Which kinda works for the majority of cases, but there are a fair few edge cases that are getting hard to deal with.
(I'm also working on an editor subclass of the HTML control, and layout time solutions are proving to be a bit problem in the editor, hence me working on getting them into the parse stage. The layout information isn't available till reflow time, which is some time after you have edited the document.)
Fire away with linkage/flames.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为 9.1 HTML 4 规范中的空白部分 就是您要找的。
I think the section 9.1 White space in the HTML 4 specification is what you’re looking for.
所以我认为我能得到的最接近的答案就在这里:
http://www.w3.org/TR/CSS2/text .html#white-space-model
So I think the closest I'm going to get for an answer on this is here:
http://www.w3.org/TR/CSS2/text.html#white-space-model
我可以推荐对空白解析的解释:
https:// developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace
I can recommend this explanation for the whitespace parsing:
https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace
如果您正在编写自己的 HTML 解析器,那么我强烈建议您使用 HTML 5 规范中的解析算法。 http://www.whatwg.org/html5 它涵盖了大量的边缘和角落情况,和一般浏览器的怪异。浏览器不遵循 SGML 规则,但它们都致力于要么执行 HTML 5 规范所说的操作,要么执行与其等效的功能。有几个开源解析器可以实现该算法,因此它应该具有您需要的一切。
If you're writing your own HTML parser, then I strongly recommend you use the parsing algorithm in the HTML 5 spec. http://www.whatwg.org/html5 It covers a large number of edge and corner cases, and general browser weirdness. Browsers don't follow SGML rules, but they are all homing in on either doing what the HTML 5 spec says, or the functional equivalent of it. There are several open source parsers available that implement the algorithm, so it should have everything you need.