解析 XML 时出现问题?
我正在用 C# 制作一个简单的 RSS 阅读器应用程序。这是我第一次使用 XML,需要一些帮助来解析各种 rss 提要中使用的不同样式。
例如,
以下是我期望并获得正确结果的提要类型:
<item>
<title>Cometh the hour, cometh the man</title>
<link>http://www.espnstar.com/rss-feed/detail/item612327</link>
<description>Real Madrid have finally won their first trophy since 2008. Unsurprisingly, it has coincided with the arrival of one man.</description>
<pubDate>Thu, 21 Apr 2011 04:11:42 GMT</pubDate>
</item>
但是提要如下:
<item><title>NBA: San Antonio Spurs rally to level up with Memphis Grizzlies</title>
<link>http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/l/0Ltimesofindia0Bindiatimes0N0Csports0Cnba0Ctop0Estories0CNBA0ESan0EAntonio0ESpurs0Erally0Eto0Elevel0Eup0Ewith0EMemphis0EGrizzlies0Carticleshow0C80A443210Bcms/story01.htm</link>
<description>The San Antonio Spurs, trailed by three points at half-time, rallied to level their first round playoff series with the Memphis Grizzlies at 1-1 with a 93-87 victory.<img width='1' height='1' src='http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/mf.gif' border='0'/><div class='mf-viral'><table border='0'><tr><td valign='middle'><a href="http://res.feedsportal.com/viral/sendemail2.html?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"><img src="http://res3.feedsportal.com/images/emailthis2.gif" border="0" /></a></td><td valign='middle'><a href="http://res.feedsportal.com/viral/bookmark.cfm?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"><img src="http://res3.feedsportal.com/images/bookmark.gif" border="0" /></a></td></tr></table></div><br/><br/><a href="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.htm"><img src="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.img" border="0"/></a></description>
<pubDate>Thu, 21 Apr 2011 04:33:44 GMT</pubDate>
</item>
如何从描述节点中提取主要描述文本,而不是像 href 之类的其他内容。
如何处理 feed 中的 cdata,例如:
<item>
<title><![CDATA[Japan declares no-go zone around nuclear plant ]]></title>
<author><![CDATA[AP]]></author>
<category><![CDATA[International]]></category>
<link>http://www.thehindu.com/news/international/article1714401.ece</link>
<description><![CDATA[
Japan declared a 20 km evacuation zone around its tsunami-crippled nuclear power plant a no-go zone on Thursday, urging residents to abide by the order for the sake of their own safety. Chi...
]]>
</description>
<pubDate><![CDATA[Thu, 21 Apr 2011 08:15:48 +0530]]></pubDate>
</item>
<item>
I am making a simple RSS reader application in C#. This is the first time I am working with XML and need some help in parsing the different styles used in the various rss feeds.
E.g
Following is the type of feed I am expecting and getting correct results with:
<item>
<title>Cometh the hour, cometh the man</title>
<link>http://www.espnstar.com/rss-feed/detail/item612327</link>
<description>Real Madrid have finally won their first trophy since 2008. Unsurprisingly, it has coincided with the arrival of one man.</description>
<pubDate>Thu, 21 Apr 2011 04:11:42 GMT</pubDate>
</item>
But feeds like:
<item><title>NBA: San Antonio Spurs rally to level up with Memphis Grizzlies</title>
<link>http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/l/0Ltimesofindia0Bindiatimes0N0Csports0Cnba0Ctop0Estories0CNBA0ESan0EAntonio0ESpurs0Erally0Eto0Elevel0Eup0Ewith0EMemphis0EGrizzlies0Carticleshow0C80A443210Bcms/story01.htm</link>
<description>The San Antonio Spurs, trailed by three points at half-time, rallied to level their first round playoff series with the Memphis Grizzlies at 1-1 with a 93-87 victory.<img width='1' height='1' src='http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/mf.gif' border='0'/><div class='mf-viral'><table border='0'><tr><td valign='middle'><a href="http://res.feedsportal.com/viral/sendemail2.html?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"><img src="http://res3.feedsportal.com/images/emailthis2.gif" border="0" /></a></td><td valign='middle'><a href="http://res.feedsportal.com/viral/bookmark.cfm?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"><img src="http://res3.feedsportal.com/images/bookmark.gif" border="0" /></a></td></tr></table></div><br/><br/><a href="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.htm"><img src="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.img" border="0"/></a></description>
<pubDate>Thu, 21 Apr 2011 04:33:44 GMT</pubDate>
</item>
How do I extract the main description text from the description node and not the other stuff like the hrefs.
How to handle cdata in feeds like:
<item>
<title><![CDATA[Japan declares no-go zone around nuclear plant ]]></title>
<author><![CDATA[AP]]></author>
<category><![CDATA[International]]></category>
<link>http://www.thehindu.com/news/international/article1714401.ece</link>
<description><![CDATA[
Japan declared a 20 km evacuation zone around its tsunami-crippled nuclear power plant a no-go zone on Thursday, urging residents to abide by the order for the sake of their own safety. Chi...
]]>
</description>
<pubDate><![CDATA[Thu, 21 Apr 2011 08:15:48 +0530]]></pubDate>
</item>
<item>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看看这个
如何开始制作 C# RSS 阅读器?
have a look at this
How can I get started making a C# RSS Reader?
好吧,您可以使用正则表达式来删除不需要的所有内容。
对于第二轮比赛,结果将是:半场落后 3 分的圣安东尼奥马刺队在季后赛首轮系列赛中以 93-87 战胜孟菲斯灰熊队,以 1-1 扳平 /em>.第一个 feed 将保持不变,而第三个 feed 将自动忽略所有 CDATA 内容。
Well, you could use a regex to get rid of everything you don't need.
For the second feed, the result will be: The San Antonio Spurs, trailed by three points at half-time, rallied to level their first round playoff series with the Memphis Grizzlies at 1-1 with a 93-87 victory. The first feed will come out unchanged, and the third feed will have all the CDATA stuff ignored automatically.