如何解析 html 以获得 3 个 url 来分隔字符串?

发布于 2024-12-05 18:26:08 字数 2381 浏览 1 评论 0原文

我正在尝试解析此 HTML 中的每个 URL

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>

,我想使用 jsoup 将每个 URL 解析为单独的字符串。

我在 jsoup 解析方面做得很好。但我想在这里做什么,我不知道从哪里开始在自己的字符串中获取每个 url

我该如何在这里执行此操作?解析然后将其分离为字符串?

编辑:

或者如果我不能让它们分开字符串,也许我可以将它们设置为一个列表?并以某种方式按位置加载它们?

或者我可以加载每个...1 by 1吗?

只是我想到的一些建议...

编辑:从下面的评论中我看到这就是我需要将链接提取为列表的内容。

/**
* Example program to list links from a URL.
*/
public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

        print("\nMedia: (%d)", media.size());
        for (Element src : media) {
            if (src.tagName().equals("img"))
                print(" * %s: <%s> %sx%s (%s)",
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                        trim(src.attr("alt"), 20));
            else
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
        }
    }
}

我不认为这完全适合我的使用,但方向正确。

我需要做什么才能让它提取上面的 html src 示例列表?

I am trying to parse each URL from this HTML

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>

I want to parse each URL to a seperate String using jsoup.

Ive been doing pretty good with jsoup parsing. But what i want to do here i dont know where to begin to get each url in its own String

How do i go about doing this here? Parsing and then getting it to seperate Strings?

EDIT:

Or if i cant get them to seperate strings, Maybe i could set them to a list? and load them by position some way?

OR Could i load each one...1 by 1?

Just some suggestions im thinking of...

EDIT: From the comment below i see that this is what i need to extract the links as a list.

/**
* Example program to list links from a URL.
*/
public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

        print("\nMedia: (%d)", media.size());
        for (Element src : media) {
            if (src.tagName().equals("img"))
                print(" * %s: <%s> %sx%s (%s)",
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                        trim(src.attr("alt"), 20));
            else
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
        }
    }
}

I dont think this is exactly optimized for my use but in the right direction.

What do i need to do have it extract my example list above of html src's?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

神也荒唐 2024-12-12 18:26:08

您只想要所有图像吗?然后尝试这个 XPath 表达式:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//img", doc, XPathConstants.NODESET);

List<String> imageUrls = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
    Node img = nodes.item(i);
    imageUrls.add(img.getAttributes().getNamedItem("src").getNodeValue());
}

Do you just want all images? Then try this XPath expression:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//img", doc, XPathConstants.NODESET);

List<String> imageUrls = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
    Node img = nodes.item(i);
    imageUrls.add(img.getAttributes().getNamedItem("src").getNodeValue());
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文