HtmlUnit getByXpath 返回 null
我正在使用 Groovy 进行编码,但是,我不认为这是一组特定于语言的问题。
我实际上有两个问题
第一个问题
我在使用 HtmlUnit 时遇到了问题。它告诉我,我想要抓住的东西是空的。
我正在测试的页面是: http://browse.deviantart.com/resources /applications/psbrushes/?order=9&offset=0#/dbwam4
我的代码:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
这只是打印出: []
这是因为页面使用 onclick()< /强>?如果是这样,我该如何解决这个问题?启用 JavaScript 会在我的 cmd 提示符中造成混乱。
第二个问题
我也想获取图像,但遇到了麻烦,因为当我尝试获取 XPath(通过 firebug)时,它显示为://*[@id="gmi -ResViewSizer_img"]
我该如何处理?
I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[@id="gmi-ResViewSizer_img"]
How do I handle that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
第一个答案:
您的 XPATH 在正文的第 4 个
div
的谓词过滤器中偏离了 1,它应该是第 3 个div
。看起来该网站的 HTML 与您最初使用 Firebug 捕获 XPATH 相比可能/确实发生了变化。您可能需要调整 XPATH 以适应潜在的变化,并且对文档结构中的某些差异不太敏感。也许是这样的:
第二个答案:您列出的 XPATH 将会起作用。它可能看起来很奇怪/简短(并且可能不是最有效的),但是
//
从根节点开始并遍历树中的每个节点,*
匹配任何元素(包括img
)和[]
谓词过滤器将其限制为具有值等于“gmi-”的id
属性的元素ResViewSizer_img”。XPATH 还有许多其他选项也可以工作。它还取决于 HTML 结构更改的频率。这也适用于选择该
img
所引用的页面:First Answer:
Your XPATH was off by one in the predicate filter for the 4th
div
of the body, it should be the 3rddiv
. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.Maybe something like this:
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but
//
starts at the root node and looks throughout every node in the tree,*
matches on any element(to include theimg
) and the[]
predicate filter restricts it to those that have anid
attribute who's value equals "gmi-ResViewSizer_img".There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that
img
:我遇到了同样的问题,当我在页面上实现 iframe 标签时,我解决了,尝试调用
其中 n 是 iframe 集合中框架的位置。这对我来说是工作!
多谢。
I had the same problem, I solved when I realize iframe tags on page, try call
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.