HtmlAgility ParseErrors 属性
我可以修复 HtmlAgility 库中的哪些错误?从我自己的经验来看,它可以关闭丢失的标签,例如:
<car>Nissan</car
当执行 Load 或 LoadHtml 时,它会修复它,例如:
<car>Nissan</car>
我也知道 ParseErorrs 集合可以确定 Reason、Stream 等。
是否有错误列表(或者可以您根据自己的经验告诉我们)HtmlAgility 修复错误的可靠性如何以及 HtmlAgility 无法修复哪些错误?
What errors can I expect to fix HtmlAgility library? I know from my own experience it can close a missing tag, like:
<car>Nissan</car
When do Load or LoadHtml, it will fix it, like:
<car>Nissan</car>
I also know that ParseErorrs collection can determine Reason, Stream etc.
Is there a list of errors (or can you tell from your own experience) how reliable is HtmlAgility for fixing errors and what errors cannot be fixed by HtmlAgility?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从历史上看,Html Agility Pack 从来都不是为了修复 Html 而设计的,而是为了能够加载、修改和修复 Html。即使此 Html 有错误,也将其保存回来。
这意味着它将修复通常由浏览器自动修复的错误,就像您在问题中显示的错误一样。错误列表是通过实验确定的,您可以浏览源代码以深入了解它。话虽如此,它实际上是在 2000/2001 年设计的,因此该领域的情况可能已经发生了变化:-)
ParseErrors 集合将包含带有代码的 HtmlParseError 对象。该代码是一个已记录的枚举:
HtmlDocument 上还有一个 OptionFixNestedTags 属性(默认值为 false),能够在检测到嵌套错误时修复 LI、TR、TH、TD 标记。这意味着如果它检测到关闭 TR 而没有所有所需的关闭 TD,它们将自动关闭。同样,这正是浏览器对格式错误的 Html 所做的处理。
Historically, Html Agility Pack was never designed to fix Html, but rather to be able to load, modify & save it back, even if this Html has errors.
It means it will fix errors that in general are fixed automatically by browsers, like the one you show in your question. The list of errors has been determined experimentally, and you can browse the source for a deep insight about it. That being said, it was actually designed back in 2000/2001 years so things may have changed in that area :-)
The ParseErrors collection will contain HtmlParseError objects with a code. The code is an enum that's documented:
There is also an
OptionFixNestedTags
property on HtmlDocument (default value is false), that is capable of fixing LI, TR, TH, TD tags when nesting errors are detected. It means if it detects a closing TR without all the needed closing TD, they will be closed automatically. Again, this is exactly what browser will do with malformed Html.