JavaScript 中严格的 HTML 解析
在 Google Chrome (Canary) 上,似乎没有字符串可以使 DOM 解析器失败。我正在尝试解析一些 HTML,但如果 HTML 不完全、100% 有效,我希望它显示错误。我已经尝试过显而易见的方法:
var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.
我还尝试了
那么,至少有某种方法可以在 Google Chrome 中“严格”解析 HTML 吗?我不想自己对其进行标记或使用外部验证实用程序。如果没有其他选择,严格的 XML 解析器就可以,但某些元素不需要 HTML 中的结束标记,并且最好这些元素不应失败。
On Google Chrome (Canary), it seems no string can make the DOM parser fail. I'm trying to parse some HTML, but if the HTML isn't completely, 100%, valid, I want it to display an error. I've tried the obvious:
var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.
I've also tried the method in this question. Doesn't fail for invalid markup, even the most invalid markup I can produce.
So, is there some way to parse HTML "strictly" in Google Chrome at least? I don't want to resort to tokenizing it myself or using an external validation utility. If there's no other alternative, a strict XML parser is fine, but certain elements don't require closing tags in HTML, and preferably those shouldn't fail.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 DOMParser 分两步检查文档:
循环遍历每个元素,并检查 DOM 元素是否是
HTMLUnknownElement
的实例。为此,getElementsByTagName('*')
非常适合。(如果想严格解析文档,就得递归循环遍历每个元素,并记住该元素是否是 允许放置在该位置。例如
中的
)
演示: <一href="http://jsfiddle.net/q66Ep/1/" rel="noreferrer">http://jsfiddle.net/q66Ep/1/
参见此答案的修订版 1 用于替代不使用 DOMParser 的 XML 验证。
注意事项
,此方法返回
null
,而它是有效的 HTML5(因为标签未关闭)。Use the
DOMParser
to check a document in two steps:Loop through each element, and check whether the DOM element is an instance of
HTMLUnknownElement
. For this purpose,getElementsByTagName('*')
fits well.(If you want to strictly parse the document, you have to recursively loop through each element, and remember whether the element is allowed to be placed at that location. Eg.
<area>
in<map>
)Demo: http://jsfiddle.net/q66Ep/1/
See revision 1 of this answer for an alternative to XML validation without the DOMParser.
Considerations
null
for<input type="text">
, while it's valid HTML5 (because the tag is not closed).