通过 Rome 解析 RSS 提要时，序言中不允许获取内容

发布于 2024-12-20 16:04:31 字数 1128 浏览 4 评论 0原文

使用 Rome API 解析 RSS 提要我收到此错误：

com.sun.syndication.io.ParsingFeedException: Invalid XML
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210)

代码如下：

public static void main(String[] args) {
    URL url;
    XmlReader reader = null;
    SyndFeed feed; 

    try {
        url = new URL("https://www.democracynow.org/podcast.xml");
        reader = new XmlReader(url);
        feed = new SyndFeedInput().build(reader);
        for (Iterator<SyndEntry> i =feed.getEntries().iterator(); i.hasNext();) {
            SyndEntry entry = i.next();
            System.out.println(entry.getPublishedDate()+" Title  "+entry.getTitle());

        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

我检查了一些链接，例如：

http://old.nabble.com/Invalid-XML:-Error-on-line-1:-Content-is-not-allowed-in-prolog.-td21258868.html

问题可能出在字符集上，但我无法找到实现此方法的方法。任何帮助或指导将不胜感激。

感谢和问候，

Vaibhav Goswami

原文

Using Rome API to parse the RSS feeds I am getting this error :

com.sun.syndication.io.ParsingFeedException: Invalid XML
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210)

The code is as below:

public static void main(String[] args) {
    URL url;
    XmlReader reader = null;
    SyndFeed feed; 

    try {
        url = new URL("https://www.democracynow.org/podcast.xml");
        reader = new XmlReader(url);
        feed = new SyndFeedInput().build(reader);
        for (Iterator<SyndEntry> i =feed.getEntries().iterator(); i.hasNext();) {
            SyndEntry entry = i.next();
            System.out.println(entry.getPublishedDate()+" Title  "+entry.getTitle());

        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

I checked for some of the links like :

http://old.nabble.com/Invalid-XML:-Error-on-line-1:-Content-is-not-allowed-in-prolog.-td21258868.html

Where the problem is presumably is of charsets but I could not figure a way to get this implemented.
Any help or guidance would be highly appreciative.

Thanks and Regards,

Vaibhav Goswami

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹤舞 2024-12-27 16:04:31

我也在使用联合组织，并且可以获得发布日期和标题。

我的代码如下：

URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml");

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

for (Iterator i = feed.getEntries().iterator(); i.hasNext();)
{
SyndEntry entry = (SyndEntry) i.next();
System.out.println("title |"+entry.getTitle()+"   " -timeStamp "+entry.getPublishedDate()"\n")
}

这有效，我使用 Bloomberg Url 只是因为它给了我一个 XML。

如果您有其他疑问，请告诉我:)

I am using Syndication as well and i am able to get published date and title.

My code is as follows:

URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml");

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

for (Iterator i = feed.getEntries().iterator(); i.hasNext();)
{
SyndEntry entry = (SyndEntry) i.next();
System.out.println("title |"+entry.getTitle()+"   " -timeStamp "+entry.getPublishedDate()"\n")
}

This works , and i have used Bloomberg Url just cause it gives me a XML.

If your query was something else , do let me know :)

回复收藏 0 原文

静谧幽蓝 2024-12-27 16:04:31

您可以使用 SyndFeed 和 SyndEntry 来解析 xml

另外您还需要检查 xml 是否有效

URL url  = new URL("http://feeds.feedburner.com/javatipsfeed");
    XmlReader reader = null;
    try {
      reader = new XmlReader(url);
      SyndFeed feeder = new SyndFeedInput().build(reader);
      System.out.println("Feed Title: "+ feeder.getAuthor());
      for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) {
        SyndEntry syndEntry = (SyndEntry) i.next();
        System.out.println(syndEntry.getTitle());
      }
      } finally {
            if (reader != null)
                reader.close();
      }

you can use SyndFeed and SyndEntry for parsing the xml

Also you need to check whether the xml is a valid one

URL url  = new URL("http://feeds.feedburner.com/javatipsfeed");
    XmlReader reader = null;
    try {
      reader = new XmlReader(url);
      SyndFeed feeder = new SyndFeedInput().build(reader);
      System.out.println("Feed Title: "+ feeder.getAuthor());
      for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) {
        SyndEntry syndEntry = (SyndEntry) i.next();
        System.out.println(syndEntry.getTitle());
      }
      } finally {
            if (reader != null)
                reader.close();
      }

回复收藏 0 原文

┊风居住的梦幻卍 2024-12-27 16:04:31

这是由于字节顺序标记问题造成的。下面是一个 JUnit 测试用例，演示了问题和修复：

package rss;

import org.xml.sax.InputSource;

import java.io.*;
import java.net.*;

import com.sun.syndication.io.*;

import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.BOMInputStream;
import org.junit.Test;

public class RssEncodingTest {

    String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000";

    // This works because we use InputSource direct from the UrlConnection's InputStream

    @Test
    public void test01() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        try (InputStream is = new URL(url).openConnection().getInputStream()) {
            InputSource source = new InputSource(is);
            System.out.println("description: "
                    + new SyndFeedInput().build(source).getDescription());
        }
    }

    // But a String input fails because the byte order mark problem

    @Test
    public void test02() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        String html = IOUtils.toString(new URL(url).openConnection()
                .getInputStream());
        Reader reader = new StringReader(html);
        System.out.println("description: "
                + new SyndFeedInput().build(reader).getDescription());
    }

    // We can use Apache Commons IO to fix the byte order mark

    @Test
    public void test03() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        String html = IOUtils.toString(new URL(url).openConnection()
                .getInputStream());
        try (BOMInputStream bomIn = new BOMInputStream(
                IOUtils.toInputStream(html))) {
            String f = IOUtils.toString(bomIn);
            Reader reader = new StringReader(f);
            System.out.println("description: "
                    + new SyndFeedInput().build(reader).getDescription());
        }
    }

}

It's due to a Byte Order Mark problem. Here is a JUnit test case that demonstrates the problem and the fix:

package rss;

import org.xml.sax.InputSource;

import java.io.*;
import java.net.*;

import com.sun.syndication.io.*;

import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.BOMInputStream;
import org.junit.Test;

public class RssEncodingTest {

    String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000";

    // This works because we use InputSource direct from the UrlConnection's InputStream

    @Test
    public void test01() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        try (InputStream is = new URL(url).openConnection().getInputStream()) {
            InputSource source = new InputSource(is);
            System.out.println("description: "
                    + new SyndFeedInput().build(source).getDescription());
        }
    }

    // But a String input fails because the byte order mark problem

    @Test
    public void test02() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        String html = IOUtils.toString(new URL(url).openConnection()
                .getInputStream());
        Reader reader = new StringReader(html);
        System.out.println("description: "
                + new SyndFeedInput().build(reader).getDescription());
    }

    // We can use Apache Commons IO to fix the byte order mark

    @Test
    public void test03() throws MalformedURLException, IOException,
            IllegalArgumentException, FeedException {
        String html = IOUtils.toString(new URL(url).openConnection()
                .getInputStream());
        try (BOMInputStream bomIn = new BOMInputStream(
                IOUtils.toInputStream(html))) {
            String f = IOUtils.toString(bomIn);
            Reader reader = new StringReader(f);
            System.out.println("description: "
                    + new SyndFeedInput().build(reader).getDescription());
        }
    }

}

回复收藏 0 原文

~没有更多了~