使用 jsoup 从其他 div/id 类中的类中提取 href

发布于 2024-12-06 01:16:41 字数 1404 浏览 1 评论 0原文

您好,我正在尝试从以下源中的“title”类中提取第一个href(源只是整个页面的一部分,但我正在使用整个页面):

div id="atfResults" class="list results ">
<div id="result_0" class="result firstRow product" name="0006754023">
    <div id="srNum_0" class="number">1.</div>
        <div class="image">
        <a href="http://www.amazon.co.uk/Essential-Modern-Classics-J-Tolkien/dp/0006754023/ref=sr_1_1?ie=UTF8&amp;qid=1316504574&amp;sr=8-1">
        <img src="http://ecx.images-amazon.com/images/I/31ZcWU6HN4L._AA115_.jpg" class="productImage" alt="Product Details">
</a>
</div>
<div class="data">
    <div class="title">
<a class="title titleHover" href="http://www.amazon.co.uk/Essential-Modern-Classics-J-Tolkien/dp/0006754023/ref=sr_1_1?ie=UTF8&amp;qid=1316504574&amp;sr=8-1">Essential Modern Classics - The Hobbit</a>
        <span class="ptBrand">by J. R. R. Tolkien</span>
 <span class="bindingAndRelease">(<span class="binding">Paperback</span> -&nbsp;2 Apr 2009)</span>
        </div>

我尝试了 select 函数和也有 getElementByClass 但都给了我一个“空”值,例如:

Document firstSearchPage = Jsoup.connect(fullST).get();
Element link = firstSearchPage.select("div.title").first();

如果有人可以帮助我解决这个问题并推荐一些阅读领域,以便我将来可以避免这个问题,我将不胜感激。

Hello I am trying to extract the first href from within the "title" class from the following source (the source is only part of the whole page however I am using the entire page):

div id="atfResults" class="list results ">
<div id="result_0" class="result firstRow product" name="0006754023">
    <div id="srNum_0" class="number">1.</div>
        <div class="image">
        <a href="http://www.amazon.co.uk/Essential-Modern-Classics-J-Tolkien/dp/0006754023/ref=sr_1_1?ie=UTF8&qid=1316504574&sr=8-1">
        <img src="http://ecx.images-amazon.com/images/I/31ZcWU6HN4L._AA115_.jpg" class="productImage" alt="Product Details">
</a>
</div>
<div class="data">
    <div class="title">
<a class="title titleHover" href="http://www.amazon.co.uk/Essential-Modern-Classics-J-Tolkien/dp/0006754023/ref=sr_1_1?ie=UTF8&qid=1316504574&sr=8-1">Essential Modern Classics - The Hobbit</a>
        <span class="ptBrand">by J. R. R. Tolkien</span>
 <span class="bindingAndRelease">(<span class="binding">Paperback</span> - 2 Apr 2009)</span>
        </div>

I have tried several variations of both the select function and also getElementByClass but all have given me a "null" value such as:

Document firstSearchPage = Jsoup.connect(fullST).get();
Element link = firstSearchPage.select("div.title").first();

If someone could help me with a solution to this problem and recommend some areas of reading so I can avoid this problem in future it would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泪之魂 2024-12-13 01:16:42

CSS 选择器 div.title 返回一个

,而不是您想象的链接。如果您想要 那么您应该使用 a.title 选择器。

Element link = document.select("a.title").first();
String href = link.absUrl("href");
// ...

或者,如果 可以出现在该点之前

之外的文档中的其他位置,则您需要以下更具体的选择器:

Element link = document.select("div.title a.title").first();
String href = link.absUrl("href");
// ...

这将返回第一个 ,它是

的子级代码>.

The CSS selector div.title, returns a <div class="title">, not a link as you seem to think. If you want an <a class="title"> then you should use the a.title selector.

Element link = document.select("a.title").first();
String href = link.absUrl("href");
// ...

Or if an <a class="title"> can appear elsewhere in the document outside a <div class="title"> before that point, then you need the following more specific selector:

Element link = document.select("div.title a.title").first();
String href = link.absUrl("href");
// ...

This will return the first <a class="title"> which is a child of <div class="title">.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文