如何解析并返回指向单独字符串[]或字符串的链接列表？

发布于 2024-12-05 09:34:44 字数 1673 浏览 6 评论 0原文

我有相应的 html div 类格式...

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
        <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                </div>

现在...我一直在尝试想不同的方法来做到这一点。

我想解析每个 URL 以分隔每个 URL 的字符串...

现在我正在考虑如何将它们解析为一个列表，然后通过传递一个位置来选择每个 URL？

（如果有人想回答这个问题，也请随意）

或者我可以做一些事情，例如导航到 div 类...

Element latest_images = doc.select("div.latest-media-images");
Elements links = latest_images.getElementsByTag("img");

for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}

我正在考虑这个，还没有尝试过。当我有机会时我会的。

但是我如何使用代码将每个解析为单独的字符串或整个列表？（如果正确）

请随意留下建议或答案=）或者让我知道我上面的代码是否可以解决问题。

谢谢，终生编码器22

原文

I have html div class formated accordingly....

<div class="latest-media-images">
    <div class="hdr-article">LATEST IMAGES</div>
        <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                <a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
                </div>

Now.... Ive been trying to think of different ways to do this.

I want to parse each URL to sepereate strings for each one...

Now i was thinking of some how parsing them into a list and then selecting each one by passing a position?

(If anyone wants to answer this please feel free too)

Or i could do something such as navigating to the div class...

Element latest_images = doc.select("div.latest-media-images");
Elements links = latest_images.getElementsByTag("img");

for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}

I was thinking of this,havent tried it out yet. I will when i get the chance.

But how will i parse each to a seperate string or a whole list using the code?(if its correct)

Feel free to leave suggestions or answers =) or let me know if the code i have above will do the trick.

Thanks,
coder-For-Life22

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

﹉夏雨初晴づ 2024-12-12 09:34:44

这里是使用正则表达式从 html 中提取所有 img url 的代码示例：

//I used your html with some obfuscations to test some fringe cases.
    final String HTML
            = "<div class=\"latest-media-images\">\n"
            + "<div class=\"hdr-article\">LATEST IMAGES</div>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg1\" \n "
            + "src=\"http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg2\" src=  \n"
            + "\"http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg3\" src "
            + "=    \t \n  "
            + "\"http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "</div>";

    Pattern pattern = Pattern.compile ("<img[^>]*?src\\s*?=\\s*?\\\"([^\\\"]*?)\\\"");
    Matcher matcher = pattern.matcher (HTML);

    List<String> imgUrls = new ArrayList<String> ();
    while (matcher.find ())
    {
        imgUrls.add (matcher.group (1));
    }

    for (String imgUrl : imgUrls) System.out.println (imgUrl);

输出与 Sahil Muthoo 发布的相同：

http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg

如果首先使用链接获取 html，则意味着您有一个 url，唯一的变化是使用硬编码字符串时，您需要先加载 html。例如，您可以使用 Java OOB 类 URL：

new URL ("http://some_address").openConnection ().getInputStream ();

Here goes code sample to extract all img urls from your html using RegEx:

//I used your html with some obfuscations to test some fringe cases.
    final String HTML
            = "<div class=\"latest-media-images\">\n"
            + "<div class=\"hdr-article\">LATEST IMAGES</div>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg1\" \n "
            + "src=\"http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg2\" src=  \n"
            + "\"http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "<a class=\"lnk-thumb\" href=\"http://media.pc.ign.com/media/093/093395/imgs_1.html\"><img id=\"thumbImg3\" src "
            + "=    \t \n  "
            + "\"http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg\" class=\"latestMediaThumb\" alt=\"\" height=\"109\" width=\"145\"></a>\n"
            + "</div>";

    Pattern pattern = Pattern.compile ("<img[^>]*?src\\s*?=\\s*?\\\"([^\\\"]*?)\\\"");
    Matcher matcher = pattern.matcher (HTML);

    List<String> imgUrls = new ArrayList<String> ();
    while (matcher.find ())
    {
        imgUrls.add (matcher.group (1));
    }

    for (String imgUrl : imgUrls) System.out.println (imgUrl);

The output is the same as Sahil Muthoo posted:

http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg

If by using a link to get the html first you mean that you have an url than the only change will be that instead of using a hard-coded String you'll need to load the html first. For example, you can use Java OOB class URL:

new URL ("http://some_address").openConnection ().getInputStream ();

回复收藏 0 原文

南街九尾狐 2024-12-12 09:34:44

Elements thumbs = doc.select("div.latest-media-images img.latestMediaThumb");
List<String> thumbLinks = new ArrayList<String>(); 
for(Element thumb : thumbs) {
    thumbLinks.add(thumb.attr("src"));
}
for(String thumb : thumbLinks) {
    System.out.println(thumb);
}

输出

http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg

Elements thumbs = doc.select("div.latest-media-images img.latestMediaThumb");
List<String> thumbLinks = new ArrayList<String>(); 
for(Element thumb : thumbs) {
    thumbLinks.add(thumb.attr("src"));
}
for(String thumb : thumbLinks) {
    System.out.println(thumb);
}

Output

http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg
http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg

回复收藏 0 原文