JDOM 使用变音符号解析 XML

发布于 2024-12-23 01:29:47 字数 1148 浏览 1 评论 0原文

我正在尝试从 google 天气 api 获取天气数据并通过 JDOM 解析文档。

这是我正在使用的代码：

SAXBuilder builder = new SAXBuilder();
Document doc;
URL url = new URL(GOOGLE_WEATHER_API);
doc = builder.build(url);       
Element root = doc.getRootElement();
Element weather = root.getChild("weather");
List currentConditions = weather.getChildren("current_conditions");
...

问题是，每当 Google 返回的 XML 包含变音符号（ü、ä、ö...）时，我都会收到 JDOMParseException

org.jdom.input.JDOMParseException：文档第 1 行出现错误http://www.google.de/ig/api?weather=Heidelberg&hl=en：
致命错误：com.sap.engine.lib.xml.parser.ParserException：
在字符（十六进制）0x72、（二进制）1110010 处检测到不正确的编码序列。
检查解析的输入是否包含正确编码的字符。
使用的编码为：'utf-8'(http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191):
在字符（十六进制）0x72、（二进制）1110010 处检测到不正确的编码序列。
检查解析的输入是否包含正确编码的字符。
使用的编码是：'utf-8' (http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191)

当我在浏览器中打开 URL 时检查该页面的属性编码为UTF-8。所以我不知道为什么它不起作用。有人有想法吗？

此致，保罗

原文

I'm trying to get the weather data from googles weather api and parse the document via JDOM.

This is the code I'm using:

SAXBuilder builder = new SAXBuilder();
Document doc;
URL url = new URL(GOOGLE_WEATHER_API);
doc = builder.build(url);       
Element root = doc.getRootElement();
Element weather = root.getChild("weather");
List currentConditions = weather.getChildren("current_conditions");
...

Problem is that whenever the XML returned by Google contains an Umlaut (ü, ä, ö...), I get a JDOMParseException

org.jdom.input.JDOMParseException: Error on line 1 of document http://www.google.de/ig/api?weather=Heidelberg&hl=en:
Fatal Error: com.sap.engine.lib.xml.parser.ParserException:
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: 'utf-8'(http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191):
Incorrect encoded sequence detected at character (hex) 0x72, (bin) 1110010.
Check whether the input parsed contains correctly encoded characters.
Encoding used is: 'utf-8' (http://www.google.de/ig/api?weather=Heidelberg&hl=en, row:1, col:191)

When I open the URL in a Browser an check the properties of the page the encoding is UTF-8. So I don't know why it does not work.
Does anybody have an idea?

Best regards,
Paul

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何必那么矫情 2024-12-30 01:29:47

该 URL 的 xml 结果在其 xml 标头中不包含任何编码。相反，编码是在 http 响应 (ISO-8859-1) 的 Content-Type 标头上指定的。显然，即使您将 URL 传递给 jdom，它也无法正确处理（它使用 UTF-8，这是没有编码的 xml 的默认值）。您需要自己处理 http 响应（读取标头并将正确的编码传递给 jdom），或者使用可以为您执行此操作的解析器（尽管我不知道有任何标准 xml 解析器可以这样做）。

如果您使用标准 xml API，您将执行以下操作：

HttpURLConnection = (HttpURLConnection)url.openConnection();
String encoding = ... // get encoding from http header
InputSource source = new InputSpource(url.openStream());
source.setEncoding(encoding);
DocumentBuilder db = ... // create doc builder
Document doc = db.parse(source);

The xml result from that URL does not include any encoding in its xml header. Instead the encoding is specified on the Content-Type header of the http response (ISO-8859-1). Apparently, even though you are passing a URL to jdom, it is not handling this correctly (it is using UTF-8, which is the default for xml with no encoding). You need to either handle the http response yourself (reading the header and passing the correct encoding to jdom), or use a parser which can do that for you (although i don't know of any standard xml parser which will).

If you used the standard xml APIs, you would do something like:

HttpURLConnection = (HttpURLConnection)url.openConnection();
String encoding = ... // get encoding from http header
InputSource source = new InputSpource(url.openStream());
source.setEncoding(encoding);
DocumentBuilder db = ... // create doc builder
Document doc = db.parse(source);

回复收藏 0 原文

~没有更多了~