使用 SAX Parser 进行 Android XML 解析

发布于 2024-12-09 07:36:38 字数 859 浏览 0 评论 0 原文

我一直在尝试解析这个（ http://app.calvaryccm.com/mobile/ android/v1/devos) 使用此处找到的 SAX 解析器的 URL：http://android-er. blogspot.com/2010/05/simple-rss-reader-iii-show-details-once.html 我一直在研究如何处理 XML 中的描述标签。我已经尝试过使用和不使用 CDATA 标签，但似乎没有任何帮助。就好像链接被读入描述中一样。

第一部分工作正常：

在此处输入图像描述

当我尝试访问内部页面时，会发生问题。就好像链接标签在描述标签之前被读取一样。

在此处输入图像描述

我在正确显示描述标签时遇到问题。感谢您的帮助！

编辑此应用程序的完整源代码位于：http: //dl.dropbox.com/u/19136502/CCM.zip

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆蝶 2024-12-16 07:36:38

哎呀，经过大约 3 个小时的挖掘和分析你的源代码，我找到了你得到像上面这样奇怪的结果的原因。

首先看一下你解析的链接中的RSS内容：http://app.calvaryccm.com/mobile/android/v1/devos

其部分内容：

<频道> CCM每日灵修 <链接>http://www.calvaryccm.com/resources/dailydevotions.aspx <描述>墨尔本加略山教堂每日灵修 [电子邮件受保护]（各各他教堂墨尔本） <版权所有>(c)2011，墨尔本各各他教堂。保留所有权利 60 <项目> b3e91cbf-bbe9-4667-bf4c-8ff831ba09f1 教学时刻 <描述>基于“角色模型，第 4 部分”作者：马克·巴尔默牧师； 10/8-9/11, 消息#6078；每日灵修 #6 - “受教时刻”准备土壤（简介）：我和丈夫认真对待我们对上帝指示的理解，以向我们的孩子传授他的诫命。（申命记 6:7）我们去了当地的基督教书店，买了儿童圣经、学习资料、涂色书、游戏——任何能帮助我们在他们的生活中传达圣经情况的东西。种植和浇灌种子（成长）：每个父母都需要认真对待上帝对庄稼的承诺（行动/反应）：生活是上帝的课堂，提供受教的时刻。交通长时间的延误可能会令人沮丧，也可能是一个机会，让我们的孩子明白上帝的教导。培养（补充阅读）：诗篇 78:1-8；诗篇 145:4 墨尔本 klw 加略山教堂；明顿路2955号；西墨尔本，FL 32904； 321-952-9673 NLT = 新生活翻译。 <链接>http://www.calvaryccm.com/resources/dailydevotions.aspx 2011 年 10 月 16 日星期日 12:00:00 GMT

仔细观察这个标签/rss/channel/item/description，你可以看到这些东西：rsquo;或者'squo; 或 & 或 ldquo; 或 rdquo; ...这些是转义字符（左单引号，右单引号）引号、& 符号、右双引号、左双引号...甚至New Line），它们驻留在 XML 内容中。

因此，当 XML 解析器遍历这些字符时，它会考虑转义解析，这会导致您现在面临的奇怪结果。

解决方案呢？起初，我可以想到先获取 URL 的内容，然后对这些字符进行转义（添加斜线字符），现在我认为您可以再次解析它并成功。
这个解决方案似乎工作得很好，但是，我认为它可能不会，因为来自服务器的 RSS 文本内容响应的格式非常奇怪（格式不正确）。因此，如果您可以联系该网络管理员，请告诉他们在发出 RSS 订阅之前很好地格式化 RSS 内容（例如添加斜杠来转义字符、删除所有换行符...）。

其他解决方案是使用一些第三方来处理转义/取消转义的内容，例如来自 Apache Commons 的 StringEscapeUtils：http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html 或 JTidy。
但我不认为这些库最适合您的情况。

我只能说这么多。

@p/s：只是对源代码的一些注释，我认为您需要考虑使代码清晰易读，更好地维护，并适当地重新打包。

Ouch, after about 3 hours digging and analyzing your source code, I've found the reason why you have such a weird result like above.

First look at the RSS content from the link you parse: http://app.calvaryccm.com/mobile/android/v1/devos

Some parts of its content:

<?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>CCM Daily Devotions</title> <link>http://www.calvaryccm.com/resources/dailydevotions.aspx</link> <description>Calvary Chapel Melbourne's Daily Devotionals</description> <webMaster>[email protected] (Calvary Chapel Melbourne)</webMaster> <copyright>(c)2011, Calvary Chapel Melbourne. All rights reserved</copyright> <ttl>60</ttl> <item> <guid isPermaLink="false">b3e91cbf-bbe9-4667-bf4c-8ff831ba09f1</guid> <title>Teachable Moments</title> <description>Based on “Role Models, Part 4” by Pastor Mark Balmer; 10/8-9/11, Message #6078; Daily Devotional #6 - “Teachable Moments” Preparing the Soil (Introduction): My husband and I took seriously our understanding of God’s instructions to teach His commandments to our children. (Deuteronomy 6:7) We went to our local Christian bookstore and bought children’s Bibles, studies, coloring books, games—anything that would help us to communicate biblical situations in their lives. Planting and Watering the Seed (Growth): Each parent needs to take seriously God’s commthe Crop (Action/Response): Life is God’s classroom for teachable moments. A long delay in traffic can be a frustrating irritation, or it can be an opportunity to teach our children that God’s than taught. Cultivating (Additional Reading): Psalm 78:1-8; Psalm 145:4 klw Calvary Chapel of Melbourne; 2955 Minton Road; W. Melbourne, FL 32904; 321-952-9673 NLT = New Living Translation. </description> <link>http://www.calvaryccm.com/resources/dailydevotions.aspx</link> <pubDate>Sun, 16 Oct 2011 12:00:00 GMT</pubDate> </item>

Pay attention closely to this tag /rss/channel/item/description, what you can see are these things: rsquo; or 'squo; or & or ldquo; or rdquo; ... Those are escaped characters (Left Single Quote, Right Single Quote, Ampersand, Right Double Quote, Left Double Quote...even New Line), they are residing in XML content.

So when the XML Parser walk through these characters, it thinks about to escape parsing, which leads to weird result as you are facing right now.

What about solution? At first, I can think of getting the content of the URL first, then unescape those characters (adding SLASH characters), now I think you can parse it again with success.
This solution seems to work well, however, I think it might not, because the RSS text content response from server is in really weird format (not well-formatted). So if you can contact to this web administrator, tell them to format RSS content nicely (like adding SLASH to escape characters, remove all NEW-LINE characters...) before issuing the RSS subscription.

The other solutions is to use some third-party that handle escaping/unescaping stuffs like StringEscapeUtils from Apache Commons: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html or JTidy.
But I don't think these libraries work best in your case.

That's all I can tell.

@p/s: just some comments to your source code, I think you need to think about make your code clear to read, better for maintenance, and re-package appropriately.

回复收藏 0 原文

~没有更多了~