Android Pull 解析 RSS feed 遇到麻烦
我正在为 Android 开发一个非常简单的 RSS 阅读器,作为一种学习体验。我决定使用 XmlPullParser 来解析提要,因为它非常简单并且具有可接受的效率水平(满足我的需要)。我在尝试解析我的测试提要 (rss.slashdot.org/slashdot/slashdot) 时遇到错误,尽管在网络上搜索答案,但我似乎无法解决该错误。错误(来自 Eclipse)是:
START_TAG <image>@2:1252 in java.io.InputStreamReader@43e7a488
START_TAG (empty) <{http://www.w3.org/2005/Atom}atom10:link rel='self' type='application/rss+xml' href='http://rss.slashdot.org/Slashdot/slashdot'>@2:1517 in java.io.InputStreamReader@43e7a488
DEBUG/JRSS(313): java.net.MalformedURLException: Protocol not found:
有问题的文件是:
<image>
...
</image>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://rss.slashdot.org/Slashdot/slashdot" />
<feedburner:info uri="slashdot/slashdot" />
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" />
...
因此错误似乎发生在 feedburner 标记处。
最后,我的代码是:
public class XmlHelper
{
private XmlPullParserFactory factory;
private XmlPullParser xpp;
private final int START_TAG = XmlPullParser.START_TAG;
// Debugging Tag
private final String TAG = "JRSS";
// for channels and items
private final String TITLE = "title";
private final String LINK = "link";
private final String DESCRIPTION = "description";
private final String PUBDATE = "pubDate";
// element keys for channel
private final String LANGUAGE = "language";
private final String IMAGE = "image";
private final String ITEM = "item";
// for items
private final String AUTHOR = "author";
// for images
private final String URL = "url";
private final String WIDTH = "width";
private final String HEIGHT = "height";
public XmlHelper(Context context)
{
try
{
factory = XmlPullParserFactory.newInstance();
}
catch (XmlPullParserException e)
{
Log.d(TAG, e.toString());
}
factory.setNamespaceAware(true);
}
public Channel addFeed(URL url) throws XmlPullParserException, IOException
{
Channel c = new Channel();
c.items = new ArrayList<Item>();
xpp = factory.newPullParser();
xpp.setInput(url.openStream(), null);
// move past rss element
xpp.nextTag();
// move past channel element
xpp.nextTag();
while(xpp.nextTag() == START_TAG)
{
Log.d(TAG, xpp.getPositionDescription());
if(xpp.getName().equals(TITLE))
c.title = xpp.nextText();
else if(xpp.getName().equals(LINK))
c.url = new URL(xpp.nextText());
else if(xpp.getName().equals(DESCRIPTION))
c.description = xpp.nextText();
else if(xpp.getName().equals(LANGUAGE))
c.language = xpp.nextText();
else if(xpp.getName().equals(ITEM))
{
Item i = parseItem(xpp);
c.items.add(i);
}
else if(xpp.getName().equals(IMAGE))
{
parseImage(xpp);
}
else
xpp.nextText();
}
return c;
}
public Item parseItem(XmlPullParser xpp) throws MalformedURLException, XmlPullParserException, IOException
{
Item i = new Item();
while(xpp.nextTag() == START_TAG)
{
// do nothing for now
xpp.nextText();
}
return i;
}
private void parseImage(XmlPullParser xpp) throws XmlPullParserException, IOException
{
// do nothing for now
while(xpp.nextTag() == START_TAG)
{
xpp.nextText();
}
}
我真的不知道是否有办法忽略它(因为此时我不关心 feedburner 标签)或者是否有解析器的某些功能可以设置为使这项工作成功,或者如果我以错误的方式处理这件事。任何帮助/建议/指导将不胜感激。
I am working on a very simple RSS reader for Android as a learning experience. I decided to use the XmlPullParser for parsing the feeds as it is quite simple and has an acceptable level of efficiency (for my needs). I am getting an error while trying to parse my test feed (rss.slashdot.org/slashdot/slashdot) that I can't seem to resolve despite scouring the web for answers. The error (from eclipse) is:
START_TAG <image>@2:1252 in java.io.InputStreamReader@43e7a488
START_TAG (empty) <{http://www.w3.org/2005/Atom}atom10:link rel='self' type='application/rss+xml' href='http://rss.slashdot.org/Slashdot/slashdot'>@2:1517 in java.io.InputStreamReader@43e7a488
DEBUG/JRSS(313): java.net.MalformedURLException: Protocol not found:
The file in question is:
<image>
...
</image>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://rss.slashdot.org/Slashdot/slashdot" />
<feedburner:info uri="slashdot/slashdot" />
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" />
...
so the error appears to occur at the feedburner tag.
finally, my code is:
public class XmlHelper
{
private XmlPullParserFactory factory;
private XmlPullParser xpp;
private final int START_TAG = XmlPullParser.START_TAG;
// Debugging Tag
private final String TAG = "JRSS";
// for channels and items
private final String TITLE = "title";
private final String LINK = "link";
private final String DESCRIPTION = "description";
private final String PUBDATE = "pubDate";
// element keys for channel
private final String LANGUAGE = "language";
private final String IMAGE = "image";
private final String ITEM = "item";
// for items
private final String AUTHOR = "author";
// for images
private final String URL = "url";
private final String WIDTH = "width";
private final String HEIGHT = "height";
public XmlHelper(Context context)
{
try
{
factory = XmlPullParserFactory.newInstance();
}
catch (XmlPullParserException e)
{
Log.d(TAG, e.toString());
}
factory.setNamespaceAware(true);
}
public Channel addFeed(URL url) throws XmlPullParserException, IOException
{
Channel c = new Channel();
c.items = new ArrayList<Item>();
xpp = factory.newPullParser();
xpp.setInput(url.openStream(), null);
// move past rss element
xpp.nextTag();
// move past channel element
xpp.nextTag();
while(xpp.nextTag() == START_TAG)
{
Log.d(TAG, xpp.getPositionDescription());
if(xpp.getName().equals(TITLE))
c.title = xpp.nextText();
else if(xpp.getName().equals(LINK))
c.url = new URL(xpp.nextText());
else if(xpp.getName().equals(DESCRIPTION))
c.description = xpp.nextText();
else if(xpp.getName().equals(LANGUAGE))
c.language = xpp.nextText();
else if(xpp.getName().equals(ITEM))
{
Item i = parseItem(xpp);
c.items.add(i);
}
else if(xpp.getName().equals(IMAGE))
{
parseImage(xpp);
}
else
xpp.nextText();
}
return c;
}
public Item parseItem(XmlPullParser xpp) throws MalformedURLException, XmlPullParserException, IOException
{
Item i = new Item();
while(xpp.nextTag() == START_TAG)
{
// do nothing for now
xpp.nextText();
}
return i;
}
private void parseImage(XmlPullParser xpp) throws XmlPullParserException, IOException
{
// do nothing for now
while(xpp.nextTag() == START_TAG)
{
xpp.nextText();
}
}
I don't really know if there is a way to just ignore this (because at this point I don't care about the feedburner tag) or if there is some feature of the parser that I can set to make this work, or if I'm going about this the wrong way. Any help / advice / guidance would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
PullParsing 比 SAX 更高效。但在我看来,要让 RSS 提要能够解析任何提要,仍然需要做很多工作。
您需要迎合所有格式 RSS 1、RSS 2、Atom 等。即便如此,您也将不得不应对格式不佳的提要。
我过去遇到过类似的问题,因此决定在服务器上进行提要解析并获取解析的内容。这使我能够运行更复杂的库和解析器,我可以对其进行修改,而无需推出应用程序的更新。您应该查看服务器端选项,以便使您的应用程序保持轻量级和简单性。
我在 AppEngine 上运行以下服务,它允许您在终端进行更简单的 XML / JSON 解析。响应有一个固定且简单的结构。您可以使用它来解析
http://evecal.appspot.com/feedParser
您可以发送 POST 和具有以下参数的 GET 请求。
feedLink :RSS 提要响应的 URL :JSON 或 XML 作为响应格式
示例:
对于 POST 请求
curl --data-urlencode "feedLink=http://feeds.bbci.co.uk/news/world/rss. xml" --data-urlencode "response=json" http://evecal.appspot.com/feedParser
对于 GET 请求
evecal.appspot.com/feedParser?feedLink=http://feeds.nytimes.com/nyt/rss/HomePage&response=xml
我的 android 应用程序“NewsSpeak”也使用这个。
PullParsing is more efficient than SAX. But in my opinion its still leaves a lot one needs to do for getting your RSS feed to be capable of parsing any feeds out there.
You need to cater to all formats RSS 1, RSS 2, Atom etc. Even then you will have to contend with poorly formatted feeds.
I had faced similar problems in the past so decided to do my feed parsing on a server and just get the parsed contents. This allows me to run more complex libraries and parser which I can modify without pushing out updates for my app. You should look at server side options so that you can keep you app light weight and simple.
I have the following service running on AppEngine which allows for a much simpler XML / JSON parsing at your end. There is a fixed and simple structure to the response. You can use this for parsing
http://evecal.appspot.com/feedParser
You can send both POST and GET requests with the following parameters.
feedLink : The URL of the RSS feed response : JSON or XML as the response format
Examples:
For a POST request
curl --data-urlencode "feedLink=http://feeds.bbci.co.uk/news/world/rss.xml" --data-urlencode "response=json" http://evecal.appspot.com/feedParser
For GET request
evecal.appspot.com/feedParser?feedLink=http://feeds.nytimes.com/nyt/rss/HomePage&response=xml
My android app "NewsSpeak" uses this too.