我一直在尝试解析这个( http://app.calvaryccm.com/mobile/ android/v1/devos) 使用此处找到的 SAX 解析器的 URL:http://android-er. blogspot.com/2010/05/simple-rss-reader-iii-show-details-once.html 我一直在研究如何处理 XML 中的描述标签。我已经尝试过使用和不使用 CDATA 标签,但似乎没有任何帮助。就好像链接被读入描述中一样。
第一部分工作正常:
当我尝试访问内部页面时,会发生问题。就好像链接标签在描述标签之前被读取一样。
我在正确显示描述标签时遇到问题。感谢您的帮助!
编辑此应用程序的完整源代码位于:http: //dl.dropbox.com/u/19136502/CCM.zip
I have been trying to parse this ( http://app.calvaryccm.com/mobile/android/v1/devos) URL using a SAX parser found here: http://android-er.blogspot.com/2010/05/simple-rss-reader-iii-show-details-once.html I have been working on how to handle the description tag within the XML. I have tried this with and without the CDATA tag and nothing seems to help. It's almost as if the link is being read into the description.
The first part works just fine:
The problem happens when I try to access the inner page. It's almost as if the link tag is getting read before the description tag is.
I am having an issue in getting the description tag to display right. Thank you for your help!
EDIT the full source code for this application is here: http://dl.dropbox.com/u/19136502/CCM.zip
发布评论
评论(1)
哎呀,经过大约 3 个小时的挖掘和分析你的源代码,我找到了你得到像上面这样奇怪的结果的原因。
首先看一下你解析的链接中的RSS内容:
http://app.calvaryccm.com/mobile/android/v1/devos
其部分内容:
<频道>
<链接>http://www.calvaryccm.com/resources/dailydevotions.aspx
<描述>墨尔本加略山教堂每日灵修
<版权所有>(c)2011,墨尔本各各他教堂。保留所有权利
<项目>
<描述>基于“角色模型,第 4 部分”作者:马克·巴尔默牧师; 10/8-9/11,
消息#6078;每日灵修 #6 - “受教时刻”准备土壤(简介):我和丈夫认真对待我们对上帝指示的理解,以向我们的孩子传授他的诫命。 (申命记 6:7)我们去了当地的基督教书店,买了儿童圣经、学习资料、涂色书、游戏——任何能帮助我们在他们的生活中传达圣经情况的东西。种植和浇灌种子(成长):每个父母都需要认真对待上帝对庄稼的承诺(行动/反应):生活是上帝的课堂,提供受教的时刻。交通长时间的延误可能会令人沮丧,也可能是一个机会,让我们的孩子明白上帝的教导。培养(补充阅读):诗篇 78:1-8; 诗篇 145:4
墨尔本 klw 加略山教堂;明顿路2955号;西墨尔本,FL 32904; 321-952-9673
NLT = 新生活翻译。 <链接>http://www.calvaryccm.com/resources/dailydevotions.aspx
仔细观察这个标签
/rss/channel/item/description
,你可以看到这些东西:rsquo;
或者'squo;
或&
或ldquo;
或rdquo;
...这些是转义字符(左单引号,右单引号)引号、& 符号、右双引号、左双引号...甚至New Line),它们驻留在 XML 内容中。因此,当 XML 解析器遍历这些字符时,它会考虑转义解析,这会导致您现在面临的奇怪结果。
解决方案呢?起初,我可以想到先获取
URL
的内容,然后对这些字符进行转义(添加斜线字符),现在我认为您可以再次解析它并成功。这个解决方案似乎工作得很好,但是,我认为它可能不会,因为来自服务器的 RSS 文本内容响应的格式非常奇怪(格式不正确)。因此,如果您可以联系该网络管理员,请告诉他们在发出 RSS 订阅之前很好地格式化
RSS 内容
(例如添加斜杠来转义字符、删除所有换行符...)。其他解决方案是使用一些第三方来处理转义/取消转义的内容,例如来自
Apache Commons
的StringEscapeUtils
:http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html 或JTidy
。但我不认为这些库最适合您的情况。
我只能说这么多。
@p/s:只是对源代码的一些注释,我认为您需要考虑使代码清晰易读,更好地维护,并适当地重新打包。
Ouch, after about 3 hours digging and analyzing your source code, I've found the reason why you have such a weird result like above.
First look at the RSS content from the link you parse:
http://app.calvaryccm.com/mobile/android/v1/devos
Some parts of its content:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>CCM Daily Devotions</title>
<link>http://www.calvaryccm.com/resources/dailydevotions.aspx</link>
<description>Calvary Chapel Melbourne's Daily Devotionals</description>
<webMaster>[email protected] (Calvary Chapel Melbourne)</webMaster>
<copyright>(c)2011, Calvary Chapel Melbourne. All rights reserved</copyright>
<ttl>60</ttl>
<item>
<guid isPermaLink="false">b3e91cbf-bbe9-4667-bf4c-8ff831ba09f1</guid>
<title>Teachable Moments</title>
<description>Based on “Role Models, Part 4” by Pastor Mark Balmer; 10/8-9/11,
Message #6078; Daily Devotional #6 - “Teachable Moments” Preparing the Soil (Introduction): My husband and I took seriously our understanding of God’s instructions to teach His commandments to our children. (Deuteronomy 6:7) We went to our local Christian bookstore and bought children’s Bibles, studies, coloring books, games—anything that would help us to communicate biblical situations in their lives. Planting and Watering the Seed (Growth): Each parent needs to take seriously God’s commthe Crop (Action/Response): Life is God’s classroom for teachable moments. A long delay in traffic can be a frustrating irritation, or it can be an opportunity to teach our children that God’s than taught. Cultivating (Additional Reading): Psalm 78:1-8; Psalm 145:4
klw Calvary Chapel of Melbourne; 2955 Minton Road; W. Melbourne, FL 32904; 321-952-9673
NLT = New Living Translation. </description> <link>http://www.calvaryccm.com/resources/dailydevotions.aspx</link> <pubDate>Sun, 16 Oct 2011 12:00:00 GMT</pubDate> </item>
Pay attention closely to this tag
/rss/channel/item/description
, what you can see are these things:rsquo;
or'squo;
or&
orldquo;
orrdquo;
... Those are escaped characters (Left Single Quote, Right Single Quote, Ampersand, Right Double Quote, Left Double Quote...even New Line), they are residing in XML content.So when the
XML Parser
walk through these characters, it thinks about to escape parsing, which leads to weird result as you are facing right now.What about solution? At first, I can think of getting the content of the
URL
first, then unescape those characters (adding SLASH characters), now I think you can parse it again with success.This solution seems to work well, however, I think it might not, because the RSS text content response from server is in really weird format (not well-formatted). So if you can contact to this web administrator, tell them to format
RSS content
nicely (like adding SLASH to escape characters, remove all NEW-LINE characters...) before issuing the RSS subscription.The other solutions is to use some third-party that handle escaping/unescaping stuffs like
StringEscapeUtils
fromApache Commons
: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html orJTidy
.But I don't think these libraries work best in your case.
That's all I can tell.
@p/s: just some comments to your source code, I think you need to think about make your code clear to read, better for maintenance, and re-package appropriately.