WebDriver 可以使用 xpath 查找元素,而 Html Agility Pack 则不能
我在使用 Html Agility Pack 时不断遇到问题;我的 XPath 查询仅在极其简单时才有效:
//*[@id='some_id']
或者
//input
,但是,每当它们变得更复杂时,Html Agility Pack 就无法处理它。 下面是一个演示该问题的示例,我使用 WebDriver 导航到 Google,并返回页面源代码,该源代码将传递给 Html Agility Pack,并且 WebDriver 和 HtmlAgilityPack 都尝试定位元素/节点 (C#):
//The XPath query
const string xpath = "//form//tr[1]/td[1]//input[@name='q']";
//Navigate to Google and get page source
var driver = new FirefoxDriver(new FirefoxProfile()) { Url = "http://www.google.com" };
Thread.Sleep(2000);
//Can WebDriver find it?
var e = driver.FindElementByXPath(xpath);
Console.WriteLine(e!=null ? "Webdriver success" : "Webdriver failure");
//Can Html Agility Pack find it?
var source = driver.PageSource;
var htmlDoc = new HtmlDocument { OptionFixNestedTags = true };
htmlDoc.LoadHtml(source);
var nodes = htmlDoc.DocumentNode.SelectNodes(xpath);
Console.WriteLine(nodes!=null ? "Html Agility Pack success" : "Html Agility Pack failure");
driver.Quit();
在本例中,WebDriver 成功找到了该项目,但 Html Agility Pack 没有找到。
我知道,我知道,在这种情况下,将 xpath 更改为有效的路径非常容易: //input[@name='q'],但这只会修复这个特定的示例,这不是重点,我需要的东西能够完全或至少接近反映WebDriver的xpath引擎的行为,甚至是Firefox的FirePath或FireFinder插件。
如果 WebDriver 可以找到它,那么为什么 Html Agility Pack 也不能找到它呢?
I have continually had problems with Html Agility Pack; my XPath queries only ever work when they are extremely simple:
//*[@id='some_id']
or
//input
However, anytime they get more complicated, then Html Agility Pack can't handle it.
Here's an example demonstrating the problem, I'm using WebDriver to navigate to Google, and return the page source, which is passed to Html Agility Pack, and both WebDriver and HtmlAgilityPack attempt to locate the element/node (C#):
//The XPath query
const string xpath = "//form//tr[1]/td[1]//input[@name='q']";
//Navigate to Google and get page source
var driver = new FirefoxDriver(new FirefoxProfile()) { Url = "http://www.google.com" };
Thread.Sleep(2000);
//Can WebDriver find it?
var e = driver.FindElementByXPath(xpath);
Console.WriteLine(e!=null ? "Webdriver success" : "Webdriver failure");
//Can Html Agility Pack find it?
var source = driver.PageSource;
var htmlDoc = new HtmlDocument { OptionFixNestedTags = true };
htmlDoc.LoadHtml(source);
var nodes = htmlDoc.DocumentNode.SelectNodes(xpath);
Console.WriteLine(nodes!=null ? "Html Agility Pack success" : "Html Agility Pack failure");
driver.Quit();
In this case, WebDriver successfully located the item, but Html Agility Pack did not.
I know, I know, in this case it's very easy to change the xpath to one that will work: //input[@name='q'], but that will only fix this specific example, which isn't the point, I need something that will exactly or at least closely mirror the behavior of WebDriver's xpath engine, or even the FirePath or FireFinder add-ons to Firefox.
If WebDriver can find it, then why can't Html Agility Pack find it too?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您遇到的问题与 FORM 元素有关。 HTML Agility Pack 以不同方式处理该元素 - 默认情况下,它永远不会报告它有子元素。
在您给出的特定示例中,此查询确实找到了目标元素:
.//div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/ div[2]/input
但是,事实并非如此,因此很明显表单元素正在引发解析器:
.//form/div/div[2]/table/tr/td/table /tr/td/div/table/tr/td/div/div[2]/input
不过,该行为是可配置的。如果您在解析 HTML 之前放置此行,则表单将为您提供子节点:
The issue you're running into is with the FORM element. HTML Agility Pack handles that element differently - by default, it will never report that it has children.
In the particular example you gave, this query does find the target element:
.//div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input
However, this does not, so it's clear the form element is tripping up the parser:
.//form/div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input
That behavior is configurable, though. If you place this line prior to parsing the HTML, the form will give you child nodes: