HtmlAgilityPack -- 是否
我刚刚写了这个测试,看看我是否疯了......
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
namespace HtmlAgilityPackFormBug
{
class Program
{
static void Main(string[] args)
{
var doc = new HtmlDocument();
doc.LoadHtml(@"
<!DOCTYPE html>
<html>
<head>
<title>Form Test</title>
</head>
<body>
<form>
<input type=""text"" />
<input type=""reset"" />
<input type=""submit"" />
</form>
</body>
</html>
");
var body = doc.DocumentNode.SelectSingleNode("//body");
foreach (var node in body.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element))
Console.WriteLine(node.XPath);
Console.ReadLine();
}
}
}
它输出:
/html[1]/body[1]/form[1]
/html[1]/body[1]/input[1]
/html[1]/body[1]/input[2]
/html[1]/body[1]/input[3]
但是,如果我将
/html[1]/body[1]/xxx[1]
应该如此)。所以...看起来这些输入元素不包含在表单中,而是直接包含在正文中,就好像
深入挖掘源代码,我发现:
ElementsFlags.Add("form", HtmlElementFlag.CanOverlap | HtmlElementFlag.Empty);
它有“空”标志,就像 META 和 IMG 一样。为什么??表单绝对不应该是空的。
I just wrote up this test to see if I was crazy...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
namespace HtmlAgilityPackFormBug
{
class Program
{
static void Main(string[] args)
{
var doc = new HtmlDocument();
doc.LoadHtml(@"
<!DOCTYPE html>
<html>
<head>
<title>Form Test</title>
</head>
<body>
<form>
<input type=""text"" />
<input type=""reset"" />
<input type=""submit"" />
</form>
</body>
</html>
");
var body = doc.DocumentNode.SelectSingleNode("//body");
foreach (var node in body.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element))
Console.WriteLine(node.XPath);
Console.ReadLine();
}
}
}
And it outputs:
/html[1]/body[1]/form[1]
/html[1]/body[1]/input[1]
/html[1]/body[1]/input[2]
/html[1]/body[1]/input[3]
But, if I change <form>
to <xxx>
it gives me:
/html[1]/body[1]/xxx[1]
(As it should). So... it looks like those input elements are not contained within the form, but directly within the body, as if the <form>
just closed itself off immediately. What's up with that? Is this a bug?
Digging through the source, I see:
ElementsFlags.Add("form", HtmlElementFlag.CanOverlap | HtmlElementFlag.Empty);
It has the "empty" flag, like META and IMG. Why?? Forms are most definitely not supposed to be empty.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此工作项中也报告了这一点。它包含 DarthObiwan 建议的解决方法。
This is also reported in this workitem. It contains a suggested workaround from DarthObiwan.
由于我是 HAP 的原始作者,我可以解释为什么它被标记为空:)
这是因为早在 2000 年设计 HAP 时,HTML 3.2 就是标准。您可能知道标签在 HTML 中可以完美重叠。即:
粗体斜体和粗体斜体
(粗体斜体和粗体< i>斜体)受到所有浏览器的支持(尽管它并未正式出现在 HTML 规范中)。并且 FORM 标签也可以完美重叠。由于 HAP 被设计为处理任何 HTML 内容,而不是破坏您当时可以找到的大多数页面,因此我们决定将重叠标签处理为 EMPTY(使用 ElementFlags 属性),因此:
您唯一不能做的就是使用 API、树模型、XSL 或任何编程方式来处理它们。
如今,XHTML/XML 几乎无处不在,这听起来很奇怪,但这就是我创建 ElementFlags 的原因:)
Since I'm the original HAP author, I can explain why it's marked as empty :)
This is because when HAP was designed, back in 2000, HTML 3.2 was the standard. You're probably aware that tags can perfectly overlap in HTML. That is:
<b>bold<i>italic and bold</b>italic</i>
(bolditalic and bolditalic) is supported by all browsers (although it's not officially in the HTML specification). And the FORM tag can also perfectly overlap as well.Since HAP has been designed to handle any HTML content, rather than break most pages that you could find at that time, we just decided to handle overlapping tags as EMPTY (using the ElementFlags property) so:
The only thing you cannot do is work with them with the API, using the tree model, nor with XSL, or anything programmatic.
Today, with XHTML/XML almost everywhere, this sounds strange, but that's why I created the ElementFlags :)