Android SAX 解析器无法读取网络加载的文档(字符解码/编码问题)?

发布于 2024-10-26 04:00:29 字数 2475 浏览 3 评论 0原文

我正在使用 Android 从网上阅读文档,令人惊讶的是我在这里写是因为我遇到了问题。对于很多网站我没有任何问题,但对于某些网站,android 中的 xml 解析器是“脾气暴躁”。我怀疑这与字符编码有关,但我不确定到底是什么。特别是如果我用“wget”下载文件并将其提供给android,它工作正常......

Android的错误消息, 03-23 21:54:47.383: ERROR/xml(9062): org.apache.harmony.xml.ExpatParser$ParseException: 在第 1 行第 62 列:语法错误

我下载时的 xml 看起来很好。

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
    <channel>
    ...

我的示例 Android 应用程序......

package com.example.android.helloactivity;

import java.net.URL;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import android.app.Activity;
import android.os.Bundle;
import android.util.Log;
import android.widget.Toast;

public class HelloActivity extends Activity {

    class EnclosureHandler extends DefaultHandler {
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
        }

        @Override
        public void endElement(String uri, String localName, String name)
                throws SAXException {
        }

        @Override
        public void startElement(String namespaceURI, String localName,
                String qName, Attributes atts) throws SAXException {
            Log.i("xml", "lname is : " + qName);
        }
    };

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.hello_activity);

        try {
            SAXParserFactory spf = SAXParserFactory.newInstance();
            SAXParser sp = spf.newSAXParser();
            InputSource is = new InputSource(new URL(
                    "http://www.hbo.com/podcasts/billmaher/podcast.xml")
                    .openStream());
            sp.parse(is, new EnclosureHandler());
        } catch (Throwable t) {
            Log.e("xml", t.toString());
            Toast.makeText(getApplicationContext(), t.toString(),
                    Toast.LENGTH_LONG).show();

        }

    }
}

I'm using Android to read a document off the net, surprise I'm writing here because I have an issue. For lots of sites I have no issues, but for some sites the xml parser in android is "grumpy". I suspect it's something to do with the Character encoding, but I'm not sure exactly what. In particular if I download the file with "wget" and feed it to android, it works fine....

Android's error message,
03-23 21:54:47.383: ERROR/xml(9062): org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 62: syntax error

The xml when I download it seems fine.

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
    <channel>
    ...

My sample android application....

package com.example.android.helloactivity;

import java.net.URL;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import android.app.Activity;
import android.os.Bundle;
import android.util.Log;
import android.widget.Toast;

public class HelloActivity extends Activity {

    class EnclosureHandler extends DefaultHandler {
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
        }

        @Override
        public void endElement(String uri, String localName, String name)
                throws SAXException {
        }

        @Override
        public void startElement(String namespaceURI, String localName,
                String qName, Attributes atts) throws SAXException {
            Log.i("xml", "lname is : " + qName);
        }
    };

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.hello_activity);

        try {
            SAXParserFactory spf = SAXParserFactory.newInstance();
            SAXParser sp = spf.newSAXParser();
            InputSource is = new InputSource(new URL(
                    "http://www.hbo.com/podcasts/billmaher/podcast.xml")
                    .openStream());
            sp.parse(is, new EnclosureHandler());
        } catch (Throwable t) {
            Log.e("xml", t.toString());
            Toast.makeText(getApplicationContext(), t.toString(),
                    Toast.LENGTH_LONG).show();

        }

    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遮云壑 2024-11-02 04:00:29

事实证明字符编码不是问题。 HBO.com 网站根据 USER-AGENT: 标头返回不同的内容。因此,如果您使用 Android 与 hbo.com 网站交谈,他们会返回一条消息,告诉您如何使用自己的 Android 客户端访问该网站。他们可能正在努力帮助人们使用网络浏览器。更改 USER-AGENT 会导致上述程序获取正确的(且可解析的)xml 文档。

Turns out that character encoding is not the issue. The HBO.com web site returns different content based on the USER-AGENT: header. So if you use Android to talk with the hbo.com site, they return a message about how you could use there own android client to access the site. They probably are trying to help people using web browsers. Changing the USER-AGENT then caused the above program to get the correct (and parse-able) xml document.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文