Android SAX 解析器无法读取网络加载的文档(字符解码/编码问题)?
我正在使用 Android 从网上阅读文档,令人惊讶的是我在这里写是因为我遇到了问题。对于很多网站我没有任何问题,但对于某些网站,android 中的 xml 解析器是“脾气暴躁”。我怀疑这与字符编码有关,但我不确定到底是什么。特别是如果我用“wget”下载文件并将其提供给android,它工作正常......
Android的错误消息, 03-23 21:54:47.383: ERROR/xml(9062): org.apache.harmony.xml.ExpatParser$ParseException: 在第 1 行第 62 列:语法错误
我下载时的 xml 看起来很好。
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
...
我的示例 Android 应用程序......
package com.example.android.helloactivity;
import java.net.URL;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import android.app.Activity;
import android.os.Bundle;
import android.util.Log;
import android.widget.Toast;
public class HelloActivity extends Activity {
class EnclosureHandler extends DefaultHandler {
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
}
@Override
public void endElement(String uri, String localName, String name)
throws SAXException {
}
@Override
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
Log.i("xml", "lname is : " + qName);
}
};
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.hello_activity);
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
InputSource is = new InputSource(new URL(
"http://www.hbo.com/podcasts/billmaher/podcast.xml")
.openStream());
sp.parse(is, new EnclosureHandler());
} catch (Throwable t) {
Log.e("xml", t.toString());
Toast.makeText(getApplicationContext(), t.toString(),
Toast.LENGTH_LONG).show();
}
}
}
I'm using Android to read a document off the net, surprise I'm writing here because I have an issue. For lots of sites I have no issues, but for some sites the xml parser in android is "grumpy". I suspect it's something to do with the Character encoding, but I'm not sure exactly what. In particular if I download the file with "wget" and feed it to android, it works fine....
Android's error message,
03-23 21:54:47.383: ERROR/xml(9062): org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 62: syntax error
The xml when I download it seems fine.
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
...
My sample android application....
package com.example.android.helloactivity;
import java.net.URL;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import android.app.Activity;
import android.os.Bundle;
import android.util.Log;
import android.widget.Toast;
public class HelloActivity extends Activity {
class EnclosureHandler extends DefaultHandler {
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
}
@Override
public void endElement(String uri, String localName, String name)
throws SAXException {
}
@Override
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
Log.i("xml", "lname is : " + qName);
}
};
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.hello_activity);
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
InputSource is = new InputSource(new URL(
"http://www.hbo.com/podcasts/billmaher/podcast.xml")
.openStream());
sp.parse(is, new EnclosureHandler());
} catch (Throwable t) {
Log.e("xml", t.toString());
Toast.makeText(getApplicationContext(), t.toString(),
Toast.LENGTH_LONG).show();
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
事实证明字符编码不是问题。 HBO.com 网站根据 USER-AGENT: 标头返回不同的内容。因此,如果您使用 Android 与 hbo.com 网站交谈,他们会返回一条消息,告诉您如何使用自己的 Android 客户端访问该网站。他们可能正在努力帮助人们使用网络浏览器。更改 USER-AGENT 会导致上述程序获取正确的(且可解析的)xml 文档。
Turns out that character encoding is not the issue. The HBO.com web site returns different content based on the USER-AGENT: header. So if you use Android to talk with the hbo.com site, they return a message about how you could use there own android client to access the site. They probably are trying to help people using web browsers. Changing the USER-AGENT then caused the above program to get the correct (and parse-able) xml document.