提取内部元素而不循环

发布于 2024-11-03 06:32:00 字数 1882 浏览 1 评论 0原文

如果我循环遍历所有 并在第一个之后立即中断,则从以下示例 HTML 代码中提取 href 值是很简单的:

  <li class="parts partname parts_first">
    <div id="dpdn10" uri="/public/page/part1" class="partype partstate">
      <div class="ptctainer">
        <div class="ptitle">
          <p class="ptypead">
            <span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
            <span class="ndx">
              <a href="#" dndx="dpdn10" class="xpnd _t" style="opacity:1">Details: </a>
            </span>
          </p>
        </div>
      </div>

      <div id="dpdn10_content" class="xpns">
        <div class="ptctainer">
          <div class="ptitle">
            <p class="ptypead">
              <span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
              <span class="ndx"><a href="#" class="xpnd">Details: </a></span>
            </p>
          </div>
        </div>    
      </div>
    </div>
  </li>

当我可以假设两个实例的 href 值相同时,我当然可以做到这一点 如上例所示。

但是,如果它们不相同并且我想提取特定的一个(第一个或第二个),则此方法会失败。

这让我在 Jsoup 中寻找一种允许“嵌套选择”的机制: 到目前为止,我已经熟悉单级选择,如下所示:

Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");  // img with src ending .png
Element masthead = doc.select("div.masthead").first();  // div with class=masthead

但我找不到多级选择的文档或示例,

Element link= doc.select("div.xpns.div.ptctainer.div.ptitle.p.ptypead.span.rtext");

例如当然,上面只是为了说明而不是真正的语法。我不知道这样的事情在 Jsoup 中是否可能(还)。

Jsoup 中是否存在这种“嵌套选择”?

Extracting the href value from the following sample HTML code is straight forward if I loop through all and break immediately after the first one:

  <li class="parts partname parts_first">
    <div id="dpdn10" uri="/public/page/part1" class="partype partstate">
      <div class="ptctainer">
        <div class="ptitle">
          <p class="ptypead">
            <span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
            <span class="ndx">
              <a href="#" dndx="dpdn10" class="xpnd _t" style="opacity:1">Details: </a>
            </span>
          </p>
        </div>
      </div>

      <div id="dpdn10_content" class="xpns">
        <div class="ptctainer">
          <div class="ptitle">
            <p class="ptypead">
              <span class="rtext"><a href="http://www.example.com/page/ptname.html?dv=rfirst" class="mnLabel">First</a></span>
              <span class="ndx"><a href="#" class="xpnd">Details: </a></span>
            </p>
          </div>
        </div>    
      </div>
    </div>
  </li>

I can certainly do that when I can assume the href value is identical for both instances of as in the example above.

However, this approach fails if they are not identical and I want to extract a specific one (either the first or the second).

Which brings me to searching for a mechanism in Jsoup that allows "nested selection": Up until now I have been familiar with single-level selection as in:

Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");  // img with src ending .png
Element masthead = doc.select("div.masthead").first();  // div with class=masthead

But I can't find documentation or an example for multi-level selection, e.g.

Element link= doc.select("div.xpns.div.ptctainer.div.ptitle.p.ptypead.span.rtext");

The above is for illustration and not real syntax, of course. I don't know if something like this is possible (yet) in Jsoup.

Does such "nested selection" exist in Jsoup?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鹤仙姿 2024-11-10 06:32:00

jsoup 选择器的工作方式与 CSS 类似。请参阅选择器文档以获得完整支持。

您可以像这样进行后代选择:

Element link = doc.select("div.xpns div.ptctainer div.ptitle p.ptypead span.rtext").first();

如果标签名称对选择不重要,并且您只需要使用类名称:

Element link = doc.select(".xpns .ptctainer .ptitle .ptypead .rtext").first();

这些查询非常有效。

The jsoup selectors work just like CSS. See the Selector document for the full support.

You can do descendent selections like this:

Element link = doc.select("div.xpns div.ptctainer div.ptitle p.ptypead span.rtext").first();

If the tag name is not important to the selection, and you only need to use the class name:

Element link = doc.select(".xpns .ptctainer .ptitle .ptypead .rtext").first();

These queries are very effiecient.

无名指的心愿 2024-11-10 06:32:00

你不能只是“链接”选择功能吗?喜欢:

Element link = doc.select("div.xpns").select("div.ptctainer").select("div.ptitle").select("p.ptypead").select("span.rtext");

Can't you just 'chain' the selection functions? Like:

Element link = doc.select("div.xpns").select("div.ptctainer").select("div.ptitle").select("p.ptypead").select("span.rtext");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文