JDOM 使用变音符号解析 XML
我正在尝试从 google 天气 api 获取天气数据并通过 JDOM 解析文档。
这是我正在使用的代码:
SAXBuilder builder = new SAXBuilder();
Document doc;
URL url = new URL(GOOGLE_WEATHER_API);
doc = builder.build(url);
Element root = doc.getRootElement();
Element weather = root.getChild("weather");
List currentConditions = weather.getChildren("current_conditions");
...
问题是,每当 Google 返回的 XML 包含变音符号(ü、ä、ö...)时,我都会收到 JDOMParseException
org.jdom.input.JDOMParseException:文档 第 1 行出现错误http://www.google.de/ig/api?weather=Heidelberg&hl=en:
致命错误:com.sap.engine.lib.xml.parser.ParserException:
在字符(十六进制)0x72、(二进制)1110010 处检测到不正确的编码序列。
检查解析的输入是否包含正确编码的字符。
使用的编码为:'utf-8'(http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191):
在字符(十六进制)0x72、(二进制)1110010 处检测到不正确的编码序列。
检查解析的输入是否包含正确编码的字符。
使用的编码是:'utf-8' (http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191)
当我在浏览器中打开 URL 时检查该页面的属性编码为UTF-8。所以我不知道为什么它不起作用。 有人有想法吗?
此致, 保罗
I'm trying to get the weather data from googles weather api and parse the document via JDOM.
This is the code I'm using:
SAXBuilder builder = new SAXBuilder();
Document doc;
URL url = new URL(GOOGLE_WEATHER_API);
doc = builder.build(url);
Element root = doc.getRootElement();
Element weather = root.getChild("weather");
List currentConditions = weather.getChildren("current_conditions");
...
Problem is that whenever the XML returned by Google contains an Umlaut (ü, ä, ö...), I get a JDOMParseException
org.jdom.input.JDOMParseException: Error on line 1 of document http://www.google.de/ig/api?weather=Heidelberg&hl=en:
Fatal Error: com.sap.engine.lib.xml.parser.ParserException:
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: 'utf-8'(http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191):
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: 'utf-8' (http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191)
When I open the URL in a Browser an check the properties of the page the encoding is UTF-8. So I don't know why it does not work.
Does anybody have an idea?
Best regards,
Paul
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该 URL 的 xml 结果在其 xml 标头中不包含任何编码。相反,编码是在 http 响应 (ISO-8859-1) 的 Content-Type 标头上指定的。显然,即使您将 URL 传递给 jdom,它也无法正确处理(它使用 UTF-8,这是没有编码的 xml 的默认值)。您需要自己处理 http 响应(读取标头并将正确的编码传递给 jdom),或者使用可以为您执行此操作的解析器(尽管我不知道有任何标准 xml 解析器可以这样做)。
如果您使用标准 xml API,您将执行以下操作:
The xml result from that URL does not include any encoding in its xml header. Instead the encoding is specified on the Content-Type header of the http response (ISO-8859-1). Apparently, even though you are passing a URL to jdom, it is not handling this correctly (it is using UTF-8, which is the default for xml with no encoding). You need to either handle the http response yourself (reading the header and passing the correct encoding to jdom), or use a parser which can do that for you (although i don't know of any standard xml parser which will).
If you used the standard xml APIs, you would do something like: