如何解析 html 以获得 3 个 url 来分隔字符串？

发布于 2024-12-05 18:26:08 字数 2381 浏览 3 评论 0原文

我正在尝试解析此 HTML 中的每个 URL

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>

，我想使用 jsoup 将每个 URL 解析为单独的字符串。

我在 jsoup 解析方面做得很好。但我想在这里做什么，我不知道从哪里开始在自己的字符串中获取每个 url

我该如何在这里执行此操作？解析然后将其分离为字符串？

编辑：

或者如果我不能让它们分开字符串，也许我可以将它们设置为一个列表？并以某种方式按位置加载它们？

或者我可以加载每个...1 by 1吗？

只是我想到的一些建议...

编辑：从下面的评论中我看到这就是我需要将链接提取为列表的内容。

/**
* Example program to list links from a URL.
*/
public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

        print("\nMedia: (%d)", media.size());
        for (Element src : media) {
            if (src.tagName().equals("img"))
                print(" * %s: <%s> %sx%s (%s)",
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                        trim(src.attr("alt"), 20));
            else
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
        }
    }
}

我不认为这完全适合我的使用，但方向正确。

我需要做什么才能让它提取上面的 html src 示例列表？

原文

I am trying to parse each URL from this HTML

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
    <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>

I want to parse each URL to a seperate String using jsoup.

Ive been doing pretty good with jsoup parsing. But what i want to do here i dont know where to begin to get each url in its own String

How do i go about doing this here? Parsing and then getting it to seperate Strings?

EDIT:

Or if i cant get them to seperate strings, Maybe i could set them to a list? and load them by position some way?

OR Could i load each one...1 by 1?

Just some suggestions im thinking of...

EDIT: From the comment below i see that this is what i need to extract the links as a list.

/**
* Example program to list links from a URL.
*/
public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

        print("\nMedia: (%d)", media.size());
        for (Element src : media) {
            if (src.tagName().equals("img"))
                print(" * %s: <%s> %sx%s (%s)",
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                        trim(src.attr("alt"), 20));
            else
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
        }
    }
}

I dont think this is exactly optimized for my use but in the right direction.

What do i need to do have it extract my example list above of html src's?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

神也荒唐 2024-12-12 18:26:08

您只想要所有图像吗？然后尝试这个 XPath 表达式：

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//img", doc, XPathConstants.NODESET);

List<String> imageUrls = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
    Node img = nodes.item(i);
    imageUrls.add(img.getAttributes().getNamedItem("src").getNodeValue());
}

Do you just want all images? Then try this XPath expression:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//img", doc, XPathConstants.NODESET);

List<String> imageUrls = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
    Node img = nodes.item(i);
    imageUrls.add(img.getAttributes().getNamedItem("src").getNodeValue());
}

回复收藏 0 原文

~没有更多了~