SAX XML 解析器在处理特殊字符时遇到麻烦
首先,我是这个 java/android 开发世界的新手,所以对我来说,我可能会问一些相对新手的问题:)。
不管怎样,我现在一整天都在解决这个问题,我自己无法找出任何解决方案,我在网上搜索了一些绕过这个问题的想法。
我正在尝试开发一个 Android 应用程序,它可以解析外部 XML 文件中的数据。
我的解析器看起来像这样:
public class NewSAXHandler implements ContentHandler
{
private String DEBUGTAG = "NewSAXHandler";
public static setNews news = null;
boolean currentElement = false;
String currentValue = null;
public static setNews getNews()
{
return news;
}
public static void setNewsList(setNews news)
{
NewSAXHandler.news = news;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
{
currentElement = true;
if (localName.equalsIgnoreCase("channel"))
news = new setNews();
Log.d(DEBUGTAG, localName);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException
{
if (localName.equalsIgnoreCase("title"))
{
news.setHeadline(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
else if (localName.equalsIgnoreCase("pubdate"))
{
news.setDate(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException
{
if (currentElement)
{
currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
currentElement = false;
}
}
@Override
public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
{
}
@Override
public void endPrefixMapping(String prefix) throws SAXException
{
}
@Override
public void processingInstruction(String target, String data)throws SAXException
{
}
@Override
public void setDocumentLocator(Locator locator)
{
}
@Override
public void skippedEntity(String name) throws SAXException
{
}
@Override
public void startPrefixMapping(String prefix, String uri)throws SAXException
{
}
}
XML 文件解析自:
http://www.hltv.org/news.rss.php
这是我运行应用程序时的日志:
10-24 20:03:32.901: D/NewSAXHandler(975): rss
10-24 20:03:32.901: D/NewSAXHandler(975): channel
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
10-24 20:03:32.901: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): description
10-24 20:03:32.912: D/NewSAXHandler(975): item
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
10-24 20:03:32.912: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
10-24 20:03:32.922: D/NewSAXHandler(975): item
10-24 20:03:32.922: D/NewSAXHandler(975): title
10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
10-24 20:03:32.942: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
10-24 20:03:32.962: W/System.err(975): at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
10-24 20:03:32.962: W/System.err(975): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
10-24 20:03:32.962: W/System.err(975): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.access$1500(ActivityThread.java:117)
10-24 20:03:32.981: W/System.err(975): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
10-24 20:03:32.981: W/System.err(975): at android.os.Handler.dispatchMessage(Handler.java:99)
10-24 20:03:32.981: W/System.err(975): at android.os.Looper.loop(Looper.java:123)
10-24 20:03:32.992: W/System.err(975): at android.app.ActivityThread.main(ActivityThread.java:3683)
10-24 20:03:32.992: W/System.err(975): at java.lang.reflect.Method.invokeNative(Native Method)
10-24 20:03:33.002: W/System.err(975): at java.lang.reflect.Method.invoke(Method.java:507)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
10-24 20:03:33.013: W/System.err(975): at dalvik.system.NativeStart.main(Native Method)
似乎错误来自 ´ 字符。
我看不到编码,因为它不在 XML 文件中,但我猜它是 UTF-8。
我还尝试使用 StringBuilder 来存储每个字符,但没有任何运气。
我以为 XML 解析器会自行转换这些特殊字符,但它似乎不喜欢 em。
如果我尝试解析这个文件:
http://www.hltv.org/forum.rss.php
那么它会工作得更好。
有人有什么新想法吗?
**如果您需要更多我的代码,请说出来:)
最好的问候,
杰斯珀
First of all, I'm new to this java/android developement world, so bare over with me, I might ask some relative newbie'ish question :).
Anyway, I've been fizzling with this problem allmost all day now and I cannot figure out any solution by my self and I've search the web thin for ideas to bypass this problem.
I'm trying to develope an android app which parses data from an external XML file.
My parser looks like this:
public class NewSAXHandler implements ContentHandler
{
private String DEBUGTAG = "NewSAXHandler";
public static setNews news = null;
boolean currentElement = false;
String currentValue = null;
public static setNews getNews()
{
return news;
}
public static void setNewsList(setNews news)
{
NewSAXHandler.news = news;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
{
currentElement = true;
if (localName.equalsIgnoreCase("channel"))
news = new setNews();
Log.d(DEBUGTAG, localName);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException
{
if (localName.equalsIgnoreCase("title"))
{
news.setHeadline(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
else if (localName.equalsIgnoreCase("pubdate"))
{
news.setDate(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException
{
if (currentElement)
{
currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
currentElement = false;
}
}
@Override
public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
{
}
@Override
public void endPrefixMapping(String prefix) throws SAXException
{
}
@Override
public void processingInstruction(String target, String data)throws SAXException
{
}
@Override
public void setDocumentLocator(Locator locator)
{
}
@Override
public void skippedEntity(String name) throws SAXException
{
}
@Override
public void startPrefixMapping(String prefix, String uri)throws SAXException
{
}
}
And the XML file is parsed from:
http://www.hltv.org/news.rss.php
Here is the log when I run the app:
10-24 20:03:32.901: D/NewSAXHandler(975): rss
10-24 20:03:32.901: D/NewSAXHandler(975): channel
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
10-24 20:03:32.901: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): description
10-24 20:03:32.912: D/NewSAXHandler(975): item
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
10-24 20:03:32.912: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
10-24 20:03:32.922: D/NewSAXHandler(975): item
10-24 20:03:32.922: D/NewSAXHandler(975): title
10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
10-24 20:03:32.942: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
10-24 20:03:32.962: W/System.err(975): at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
10-24 20:03:32.962: W/System.err(975): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
10-24 20:03:32.962: W/System.err(975): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.access$1500(ActivityThread.java:117)
10-24 20:03:32.981: W/System.err(975): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
10-24 20:03:32.981: W/System.err(975): at android.os.Handler.dispatchMessage(Handler.java:99)
10-24 20:03:32.981: W/System.err(975): at android.os.Looper.loop(Looper.java:123)
10-24 20:03:32.992: W/System.err(975): at android.app.ActivityThread.main(ActivityThread.java:3683)
10-24 20:03:32.992: W/System.err(975): at java.lang.reflect.Method.invokeNative(Native Method)
10-24 20:03:33.002: W/System.err(975): at java.lang.reflect.Method.invoke(Method.java:507)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
10-24 20:03:33.013: W/System.err(975): at dalvik.system.NativeStart.main(Native Method)
It seems like the error is coming from the ´ character.
I cannot see the encoding since it's not in the XML file, but I guess it is UTF-8.
I've also tried using a StringBuilder to store each character without any luck.
I thought the XML parser would convert those special characters by itself, but it seems like it doesn't like em.
If I try to parse this file:
http://www.hltv.org/forum.rss.php
Then it works better.
Anyone got any new ideas?
**If you need anymore of my code, please say so :)
Best Regards,
Jesper
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题出在菲利普上面所说的编码上。
我刚刚将以下内容添加到我的代码中:
The problem was the encoding as said by Philipp above.
I've just added the follow to my code: