如何解析并返回指向单独字符串[]或字符串的链接列表?
我有相应的 html div 类格式...
<div class="latest-media-images">
<div class="hdr-article">LATEST IMAGES</div>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>
现在...我一直在尝试想不同的方法来做到这一点。
我想解析每个 URL 以分隔每个 URL 的字符串...
现在我正在考虑如何将它们解析为一个列表,然后通过传递一个位置来选择每个 URL?
(如果有人想回答这个问题,也请随意)
或者我可以做一些事情,例如导航到 div 类...
Element latest_images = doc.select("div.latest-media-images");
Elements links = latest_images.getElementsByTag("img");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
我正在考虑这个,还没有尝试过。当我有机会时我会的。
但是我如何使用代码将每个解析为单独的字符串或整个列表?(如果正确)
请随意留下建议或答案=)或者让我知道我上面的代码是否可以解决问题。
谢谢, 终生编码器22
I have html div class formated accordingly....
<div class="latest-media-images">
<div class="hdr-article">LATEST IMAGES</div>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg1" src="http://media.ignimgs.com/media/thumb/351/3513804/the-elder-scrolls-v-skyrim-20110824023151748_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg2" src="http://media.ignimgs.com/media/thumb/351/3513803/the-elder-scrolls-v-skyrim-20110824023149685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
<a class="lnk-thumb" href="http://media.pc.ign.com/media/093/093395/imgs_1.html"><img id="thumbImg3" src="http://media.ignimgs.com/media/thumb/351/3513802/the-elder-scrolls-v-skyrim-20110824023147685_thumb_ign.jpg" class="latestMediaThumb" alt="" height="109" width="145"></a>
</div>
Now.... Ive been trying to think of different ways to do this.
I want to parse each URL to sepereate strings for each one...
Now i was thinking of some how parsing them into a list and then selecting each one by passing a position?
(If anyone wants to answer this please feel free too)
Or i could do something such as navigating to the div class...
Element latest_images = doc.select("div.latest-media-images");
Elements links = latest_images.getElementsByTag("img");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
I was thinking of this,havent tried it out yet. I will when i get the chance.
But how will i parse each to a seperate string or a whole list using the code?(if its correct)
Feel free to leave suggestions or answers =) or let me know if the code i have above will do the trick.
Thanks,
coder-For-Life22
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里是使用正则表达式从 html 中提取所有 img url 的代码示例:
输出与 Sahil Muthoo 发布的相同:
如果首先使用链接获取 html,则意味着您有一个 url,唯一的变化是使用硬编码字符串时,您需要先加载 html。例如,您可以使用 Java OOB 类 URL:
Here goes code sample to extract all img urls from your html using RegEx:
The output is the same as Sahil Muthoo posted:
If by using a link to get the html first you mean that you have an url than the only change will be that instead of using a hard-coded String you'll need to load the html first. For example, you can use Java OOB class URL:
输出
Output
显然,您可以将 html 解析为 DOM 树,并使用 XPath 或 CSS 选择器提取所有“img”节点。然后迭代它们填充链接数组。
尽管您的代码并不能完全解决问题。
该循环被编写为使用“a”节点,而其之前的代码则提取 img 节点。
还有另一种方法:您可以使用 RegEx 提取所需的数据,这应该具有更好的性能和更少的内存成本。
Obviously you can parse the html into a DOM tree and extract all "img" nodes using XPath or CSS selector. And then iterating through them fill an array of links.
Though your code doesn't exactly do the trick.
The cycle is written to work with "a" nodes while the code before it extracts img nodes.
There's also another way: you can extract required data using RegEx which should have better performance and less memory cost.