SAXParser 无法解析某些字符
我正在 android/java 上使用 SAXParser 等进行一些简单的 SAXParsing
它可以正确解析文件,但是当遇到一些特殊字符时会打嗝,例如,如果它解析下面的 xml:
<?xml version="1.0" encoding="ISO-8859-1" ?><MTRXML version="1.0">
<GEOCODE key="pohj">
<LOC name1="Pohjantori" number="" city="Espoo" code="995" address="" type="1" category="poi" x="2544225" y="6674893" lon="24.79378" lat="60.18324" />
<LOC name1="Pohjois-Haaga" number="" city="Helsinki" code="41" address="" type="1" category="poi" x="2549164" y="6680186" lon="24.88405" lat="60.23018" />
<LOC name1="Pohjois-Leppävaara" number="" city="Espoo" code="50" address="" type="1" category="poi" x="2545057" y="6679240" lon="24.80974" lat="60.22216" />
当它在 Pohjois-Leppävaara 中遇到 ä 时,它会打嗝最后一行。
它给出的错误是:
01-30 18:14:52.039: WARN/System.err(686): org.apache.harmony.xml.ExpatParser$ParseException: At line 5, column 24: not well-formed (invalid token)
我确信 SAXParser 可以处理这些字符,但我相信我需要在某处设置一些编码等?
Java代码是这样的:
AXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = null;
try {
parser = factory.newSAXParser();
} catch (ParserConfigurationException e) {
e.printStackTrace();
return null;
} catch (SAXException e) {
e.printStackTrace();
return null;
}
XmlHandler handler = new XmlHandler();
try {
parser.parse(urls[0], handler);
} catch (SAXException e) {
e.printStackTrace();
return null;
} catch (IOException e) {
e.printStackTrace();
return null;
}
I am doing some simple SAXParsing with SAXParser etc on android/java
It can parse files properly, but hiccups when it encounters some special characters, for example if it parses this xml below:
<?xml version="1.0" encoding="ISO-8859-1" ?><MTRXML version="1.0">
<GEOCODE key="pohj">
<LOC name1="Pohjantori" number="" city="Espoo" code="995" address="" type="1" category="poi" x="2544225" y="6674893" lon="24.79378" lat="60.18324" />
<LOC name1="Pohjois-Haaga" number="" city="Helsinki" code="41" address="" type="1" category="poi" x="2549164" y="6680186" lon="24.88405" lat="60.23018" />
<LOC name1="Pohjois-Leppävaara" number="" city="Espoo" code="50" address="" type="1" category="poi" x="2545057" y="6679240" lon="24.80974" lat="60.22216" />
it will hiccup when it encounters ä in Pohjois-Leppävaara in the last line.
The error it gives is:
01-30 18:14:52.039: WARN/System.err(686): org.apache.harmony.xml.ExpatParser$ParseException: At line 5, column 24: not well-formed (invalid token)
I am sure SAXParser can handle those characters, but I believe I need to set some encoding etc somewhere ?
the Java code is so:
AXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = null;
try {
parser = factory.newSAXParser();
} catch (ParserConfigurationException e) {
e.printStackTrace();
return null;
} catch (SAXException e) {
e.printStackTrace();
return null;
}
XmlHandler handler = new XmlHandler();
try {
parser.parse(urls[0], handler);
} catch (SAXException e) {
e.printStackTrace();
return null;
} catch (IOException e) {
e.printStackTrace();
return null;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我预计这是文档编码中的错误。使用十六进制编辑器验证
Leppävaara
是否为字节序列4c 65 70 70 e4 76 61 61 72 61
。如果ä
不是E4
,则文档已使用 ISO-8859-1 以外的某种编码保存。I expect this is an error in the document encoding. Use a hex editor to verify that
Leppävaara
is the byte sequence4c 65 70 70 e4 76 61 61 72 61
. Ifä
is anything other thanE4
then the document has been saved using some encoding other than ISO-8859-1.这似乎可以解决这个问题:
Android: SaxParser issues using ISO-8859-1编码
This seems to solve this:
Android: SaxParser problems using ISO-8859-1 encoding