SAX XML 解析器在处理特殊字符时遇到麻烦

发布于 2024-12-11 10:13:18 字数 6529 浏览 0 评论 0原文

首先,我是这个 java/android 开发世界的新手,所以对我来说,我可能会问一些相对新手的问题:)。
不管怎样,我现在一整天都在解决这个问题,我自己无法找出任何解决方案,我在网上搜索了一些绕过这个问题的想法。

我正在尝试开发一个 Android 应用程序,它可以解析外部 XML 文件中的数据。

我的解析器看起来像这样:



    public class NewSAXHandler implements ContentHandler
    {
        private String DEBUGTAG = "NewSAXHandler";

        public static setNews news = null;
        boolean currentElement = false;
        String currentValue = null;



        public static setNews getNews()
        {
            return news;
        }

        public static void setNewsList(setNews news)
        {
            NewSAXHandler.news = news;
        }

        @Override
        public void startDocument() throws SAXException {
         // TODO Auto-generated method stub
        }

        @Override
        public void endDocument() throws SAXException {
         // TODO Auto-generated method stub
        }       

        @Override
        public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
        {
            currentElement = true;
            if (localName.equalsIgnoreCase("channel"))
                news = new setNews();
                Log.d(DEBUGTAG, localName);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException
        {
            if (localName.equalsIgnoreCase("title"))
            {
                news.setHeadline(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
            else if (localName.equalsIgnoreCase("pubdate"))
            {
                news.setDate(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException
        {   
            if (currentElement)
            {
                currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
                currentElement = false;
            }
        }

        @Override
        public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
        {

        }

        @Override
        public void endPrefixMapping(String prefix) throws SAXException
        {

        }

        @Override
        public void processingInstruction(String target, String data)throws SAXException
        {

        }

        @Override
        public void setDocumentLocator(Locator locator)
        {

        }

        @Override
        public void skippedEntity(String name) throws SAXException
        {

        }

        @Override
        public void startPrefixMapping(String prefix, String uri)throws SAXException
        {

        }   
    } 

XML 文件解析自:

http://www.hltv.org/news.rss.php

这是我运行应用程序时的日志:



    10-24 20:03:32.901: D/NewSAXHandler(975): rss
    10-24 20:03:32.901: D/NewSAXHandler(975): channel
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
    10-24 20:03:32.901: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): description
    10-24 20:03:32.912: D/NewSAXHandler(975): item
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
    10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
    10-24 20:03:32.912: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
    10-24 20:03:32.922: D/NewSAXHandler(975): item
    10-24 20:03:32.922: D/NewSAXHandler(975): title
    10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
    10-24 20:03:32.942: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
    10-24 20:03:32.962: W/System.err(975):  at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
    10-24 20:03:32.962: W/System.err(975):  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
    10-24 20:03:32.962: W/System.err(975):  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.access$1500(ActivityThread.java:117)
    10-24 20:03:32.981: W/System.err(975):  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Handler.dispatchMessage(Handler.java:99)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Looper.loop(Looper.java:123)
    10-24 20:03:32.992: W/System.err(975):  at android.app.ActivityThread.main(ActivityThread.java:3683)
    10-24 20:03:32.992: W/System.err(975):  at java.lang.reflect.Method.invokeNative(Native Method)
    10-24 20:03:33.002: W/System.err(975):  at java.lang.reflect.Method.invoke(Method.java:507)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
    10-24 20:03:33.013: W/System.err(975):  at dalvik.system.NativeStart.main(Native Method)

似乎错误来自 ´ 字符。
我看不到编码,因为它不在 XML 文件中,但我猜它是 UTF-8。
我还尝试使用 StringBuilder 来存储每个字符,但没有任何运气。

我以为 XML 解析器会自行转换这些特殊字符,但它似乎不喜欢 em。

如果我尝试解析这个文件:

http://www.hltv.org/forum.rss.php

那么它会工作得更好。

有人有什么新想法吗?

**如果您需要更多我的代码,请说出来:)

最好的问候,
杰斯珀

First of all, I'm new to this java/android developement world, so bare over with me, I might ask some relative newbie'ish question :).
Anyway, I've been fizzling with this problem allmost all day now and I cannot figure out any solution by my self and I've search the web thin for ideas to bypass this problem.

I'm trying to develope an android app which parses data from an external XML file.

My parser looks like this:



    public class NewSAXHandler implements ContentHandler
    {
        private String DEBUGTAG = "NewSAXHandler";

        public static setNews news = null;
        boolean currentElement = false;
        String currentValue = null;



        public static setNews getNews()
        {
            return news;
        }

        public static void setNewsList(setNews news)
        {
            NewSAXHandler.news = news;
        }

        @Override
        public void startDocument() throws SAXException {
         // TODO Auto-generated method stub
        }

        @Override
        public void endDocument() throws SAXException {
         // TODO Auto-generated method stub
        }       

        @Override
        public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
        {
            currentElement = true;
            if (localName.equalsIgnoreCase("channel"))
                news = new setNews();
                Log.d(DEBUGTAG, localName);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException
        {
            if (localName.equalsIgnoreCase("title"))
            {
                news.setHeadline(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
            else if (localName.equalsIgnoreCase("pubdate"))
            {
                news.setDate(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException
        {   
            if (currentElement)
            {
                currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
                currentElement = false;
            }
        }

        @Override
        public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
        {

        }

        @Override
        public void endPrefixMapping(String prefix) throws SAXException
        {

        }

        @Override
        public void processingInstruction(String target, String data)throws SAXException
        {

        }

        @Override
        public void setDocumentLocator(Locator locator)
        {

        }

        @Override
        public void skippedEntity(String name) throws SAXException
        {

        }

        @Override
        public void startPrefixMapping(String prefix, String uri)throws SAXException
        {

        }   
    } 

And the XML file is parsed from:

http://www.hltv.org/news.rss.php

Here is the log when I run the app:



    10-24 20:03:32.901: D/NewSAXHandler(975): rss
    10-24 20:03:32.901: D/NewSAXHandler(975): channel
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
    10-24 20:03:32.901: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): description
    10-24 20:03:32.912: D/NewSAXHandler(975): item
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
    10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
    10-24 20:03:32.912: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
    10-24 20:03:32.922: D/NewSAXHandler(975): item
    10-24 20:03:32.922: D/NewSAXHandler(975): title
    10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
    10-24 20:03:32.942: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
    10-24 20:03:32.962: W/System.err(975):  at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
    10-24 20:03:32.962: W/System.err(975):  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
    10-24 20:03:32.962: W/System.err(975):  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.access$1500(ActivityThread.java:117)
    10-24 20:03:32.981: W/System.err(975):  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Handler.dispatchMessage(Handler.java:99)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Looper.loop(Looper.java:123)
    10-24 20:03:32.992: W/System.err(975):  at android.app.ActivityThread.main(ActivityThread.java:3683)
    10-24 20:03:32.992: W/System.err(975):  at java.lang.reflect.Method.invokeNative(Native Method)
    10-24 20:03:33.002: W/System.err(975):  at java.lang.reflect.Method.invoke(Method.java:507)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
    10-24 20:03:33.013: W/System.err(975):  at dalvik.system.NativeStart.main(Native Method)

It seems like the error is coming from the ´ character.
I cannot see the encoding since it's not in the XML file, but I guess it is UTF-8.
I've also tried using a StringBuilder to store each character without any luck.

I thought the XML parser would convert those special characters by itself, but it seems like it doesn't like em.

If I try to parse this file:

http://www.hltv.org/forum.rss.php

Then it works better.

Anyone got any new ideas?

**If you need anymore of my code, please say so :)

Best Regards,
Jesper

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

残花月 2024-12-18 10:13:18

问题出在菲利普上面所说的编码上。

我刚刚将以下内容添加到我的代码中:

InputSource is = new InputSource(url.openStream());
is.setEncoding("ISO-8859-1");
Reader.parse(is);

The problem was the encoding as said by Philipp above.

I've just added the follow to my code:

InputSource is = new InputSource(url.openStream());
is.setEncoding("ISO-8859-1");
Reader.parse(is);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文