HtmlUnit 2.8 getFirstByXPath 与 HtmlUnit 1.14 getFirstByXPath 不同吗?
我有一个看起来像这样的站点结构:
<div class='main_container'>
<div class='item_container'>
<div class='body'>
<span class='item_name'>Item 1</span>
<span class='item_desc'>Desc 1</span>
</div>
</div>
<div class='item_container'>
<div class='body'>
<span class='item_name'>Item 2</span>
<span class='item_desc'>Desc 2</span>
</div>
</div>
...
</div><!--End of main_container-->
//Note: Some divs might not have <span @class='item_name'>Item N</span> or other elements inside the item_container
在 HtmlUnit 1.14 中,如果我想获取所有项目名称:
List<HtmlDivision> divs = (List<HtmlDivision>)page.getByXPath("//div[@class='item_container']");
for(HtmlDivision div:divs){
String name = ((HtmlElement)div.getFirstByXPath("//span[@class='item_name']")).asText();
System.out.println(name);
}
输出:
Item 1
Item 2
...
但在 HtmlUnit 2.8 中,当我执行相同操作时,我得到了。
Item 1
Item 1
...
HtmlUnit 2.8 中有解决方法吗?
I have a site structure that looks something like this:
<div class='main_container'>
<div class='item_container'>
<div class='body'>
<span class='item_name'>Item 1</span>
<span class='item_desc'>Desc 1</span>
</div>
</div>
<div class='item_container'>
<div class='body'>
<span class='item_name'>Item 2</span>
<span class='item_desc'>Desc 2</span>
</div>
</div>
...
</div><!--End of main_container-->
//Note: Some divs might not have <span @class='item_name'>Item N</span> or other elements inside the item_container
In HtmlUnit 1.14 if I want to get all item name:
List<HtmlDivision> divs = (List<HtmlDivision>)page.getByXPath("//div[@class='item_container']");
for(HtmlDivision div:divs){
String name = ((HtmlElement)div.getFirstByXPath("//span[@class='item_name']")).asText();
System.out.println(name);
}
Output:
Item 1
Item 2
...
But in HtmlUnit 2.8 when I do the same I got.
Item 1
Item 1
...
Is there a workaround on this in HtmlUnit 2.8?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能 HtmlUnit 1.4 有一个您正在利用/依赖的错误。
在您展示的代码中,
for
循环内部的 XPath 每次执行时都应返回相同的元素(如 v2.8 中所示),因为它以//< 开头/code>,从根节点开始遍历整个文档并返回它所找到的第一个发现。
如果您希望它与循环中的
相关,则应将 XPath 调整为:
.//span[@class='item_name']
It may be that HtmlUnit 1.4 had a bug that you were exploiting/relying on.
In the code that you showed, the XPath inside of the
for
loop should return the same element each time it executes(as it does in v2.8), because it starts with//
, which looks through the entire document starting at the root node and returns the first one that it finds.If you want it to be relative from the
<div>
in the loop, you should adjust your XPath to:.//span[@class='item_name']