Jsoup 解析 RSS 时出现错误？

发布于 2024-12-17 11:03:18 字数 386 浏览 3 评论 0原文

我正在尝试从此提要获取文章链接列表：

http://rss.cbc .ca/lineup/topstories.xml

但是，当 Jsoup 读入它时，标签中的链接http://www.cbc.ca/news/?cmp=rss 变为 http://www.cbc.ca /news/?cmp=rss

即标签自行关闭，如果我这样做，

Elements items = doc.select("link");

它不会抓取任何链接。

原文

I'm trying to grab a list of links to articles from this feed:

http://rss.cbc.ca/lineup/topstories.xml

However, when Jsoup reads it in, the links in the tags <link>http://www.cbc.ca/news/?cmp=rss</link> become <link />http://www.cbc.ca/news/?cmp=rss

Ie the tag self closes and if I do

Elements items = doc.select("link");

it doesn't grab any of the links.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执笔绘流年 2024-12-24 11:03:19

JSoup 是一个 HTML 解析器，在 HTML 中，link 元素被定义为具有空的内容模型。您提供的 url 似乎包含有效的 xml，所以为什么不尝试实际的 xml 解析器或提要解析器库，例如罗马？

编辑：要使用 JDK 的 Xpath 实现从文件中提取链接，您可以使用如下代码：

XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
InputSource is = new InputSource("http://rss.cbc.ca/lineup/topstories.xml");
NodeList nodes = (NodeList)xp.evaluate("//link", is, XPathConstants.NODESET);
for (int i=0, len=nodes.getLength(); i<len; i++) {
    Node node = nodes.item(i);
    String link = node.getTextContent();
    System.out.println(link);
}

JSoup is a HTML parser, in HTML the link element is defined to have an empty content model. The url you gave seems to contain valid xml, so why don't you try an actual xml parser or a feed parser library like rome?

Edit: To extract links from the file using JDK's Xpath implementation you can use code like the following:

XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
InputSource is = new InputSource("http://rss.cbc.ca/lineup/topstories.xml");
NodeList nodes = (NodeList)xp.evaluate("//link", is, XPathConstants.NODESET);
for (int i=0, len=nodes.getLength(); i<len; i++) {
    Node node = nodes.item(i);
    String link = node.getTextContent();
    System.out.println(link);
}

回复收藏 0 原文

~没有更多了~