提取内部元素而不循环
如果我循环遍历所有 并在第一个之后立即中断,则从以下示例 HTML 代码中提取 href 值是很简单的:
<li class="parts partname parts_first">
<div id="dpdn10" uri="/public/page/part1" class="partype partstate">
<div class="ptctainer">
<div class="ptitle">
<p class="ptypead">
<span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
<span class="ndx">
<a href="#" dndx="dpdn10" class="xpnd _t" style="opacity:1">Details: </a>
</span>
</p>
</div>
</div>
<div id="dpdn10_content" class="xpns">
<div class="ptctainer">
<div class="ptitle">
<p class="ptypead">
<span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
<span class="ndx"><a href="#" class="xpnd">Details: </a></span>
</p>
</div>
</div>
</div>
</div>
</li>
当我可以假设两个实例的 href 值相同时,我当然可以做到这一点 如上例所示。
但是,如果它们不相同并且我想提取特定的一个(第一个或第二个),则此方法会失败。
这让我在 Jsoup 中寻找一种允许“嵌套选择”的机制: 到目前为止,我已经熟悉单级选择,如下所示:
Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]"); // img with src ending .png
Element masthead = doc.select("div.masthead").first(); // div with class=masthead
但我找不到多级选择的文档或示例,
Element link= doc.select("div.xpns.div.ptctainer.div.ptitle.p.ptypead.span.rtext");
例如当然,上面只是为了说明而不是真正的语法。我不知道这样的事情在 Jsoup 中是否可能(还)。
Jsoup 中是否存在这种“嵌套选择”?
Extracting the href value from the following sample HTML code is straight forward if I loop through all and break immediately after the first one:
<li class="parts partname parts_first">
<div id="dpdn10" uri="/public/page/part1" class="partype partstate">
<div class="ptctainer">
<div class="ptitle">
<p class="ptypead">
<span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
<span class="ndx">
<a href="#" dndx="dpdn10" class="xpnd _t" style="opacity:1">Details: </a>
</span>
</p>
</div>
</div>
<div id="dpdn10_content" class="xpns">
<div class="ptctainer">
<div class="ptitle">
<p class="ptypead">
<span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
<span class="ndx"><a href="#" class="xpnd">Details: </a></span>
</p>
</div>
</div>
</div>
</div>
</li>
I can certainly do that when I can assume the href value is identical for both instances of as in the example above.
However, this approach fails if they are not identical and I want to extract a specific one (either the first or the second).
Which brings me to searching for a mechanism in Jsoup that allows "nested selection": Up until now I have been familiar with single-level selection as in:
Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]"); // img with src ending .png
Element masthead = doc.select("div.masthead").first(); // div with class=masthead
But I can't find documentation or an example for multi-level selection, e.g.
Element link= doc.select("div.xpns.div.ptctainer.div.ptitle.p.ptypead.span.rtext");
The above is for illustration and not real syntax, of course. I don't know if something like this is possible (yet) in Jsoup.
Does such "nested selection" exist in Jsoup?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
jsoup 选择器的工作方式与 CSS 类似。请参阅选择器文档以获得完整支持。
您可以像这样进行后代选择:
如果标签名称对选择不重要,并且您只需要使用类名称:
这些查询非常有效。
The jsoup selectors work just like CSS. See the Selector document for the full support.
You can do descendent selections like this:
If the tag name is not important to the selection, and you only need to use the class name:
These queries are very effiecient.
你不能只是“链接”选择功能吗?喜欢:
Can't you just 'chain' the selection functions? Like: