Java HttpURLConnection 的编码错误
尝试从 MS Web 服务读取生成的 XML
URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
line = buff.readLine();
text.append(line + "\n");
} while (line != null);
box.setText(text.toString());
或
URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
System.out.println(inputLine);
}
in.close();
除 Web 服务输出外的任何页面都可以正常读取 它读取大于和小于符号,奇怪的是
它读取 <到“<”和>到“>”不带空格,但如果我在这里输入不带空格的 stackoverflow 会使它们 <和>
请帮忙 谢谢
Trying to read a generated XML from a MS Webservice
URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
line = buff.readLine();
text.append(line + "\n");
} while (line != null);
box.setText(text.toString());
or
URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
System.out.println(inputLine);
}
in.close();
Any page reads fine except the web service output
it reads the greater and less than signs strangely
it read < to "& lt;" and > to "& gt;" without spaces, but if i type them here without spaces stackoverflow makes them < and >
Please help
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,这一行似乎存在混淆:
这实际上表明您希望服务器提供的文档中的每一行都经过 URL 编码。 URL 编码与文档编码不同。
http://en.wikipedia.org/wiki/Percent-encoding
http://en.wikipedia.org/wiki/Character_encoding
看看你的代码片段,我认为 URL 编码(百分比编码)不是你所追求的。
就文档字符编码而言。您正在这一行进行转换:
conn.getContent()
返回一个对字节进行操作的InputStream
,而读取器对字符进行操作 - 字符编码转换在这里完成。查看InputStreamReader
的其他构造函数,它将编码作为第二个参数。如果没有第二个参数,您将依赖于 java 中的平台默认值。例如,您可以将代码更改为:
但真正的问题是“您的服务器提供内容的编码是什么?”如果您也拥有服务器代码,则可以将其硬编码为合理的内容,例如
utf-8
。但如果它可能有所不同,您需要查看 http 标头Content-Type
来弄清楚。contentType
的内容看起来像获取此字段的简写方法是:
请注意,完全有可能没有提供字符集,或者没有
Content-Type
标头,在这种情况下,您必须求助于合理的默认值。First there seem to be a confusion on this row:
This effectively says that you expect every row in the document that your server is providing to be URL encoded. URL encoding is not the same as document encoding.
http://en.wikipedia.org/wiki/Percent-encoding
http://en.wikipedia.org/wiki/Character_encoding
Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after.
In terms of document character encoding. You are making a conversion on this line:
conn.getContent()
returns anInputStream
that operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. Checkout the other constructors ofInputStreamReader
which takes the encoding as second argument. Without the second argument you are falling back on whatever is your platform default in java.for instance lets you change your code to:
But the real question will be "what encoding is your server providing the content in?" If you own the server code too, you may just hard code it to something reasonable such as
utf-8
. But if it can vary, you need to look at the http headerContent-Type
to figure it out.The contents of
contentType
will look likeA short hand way of getting this field is:
Notice that it's entirely possible that no charset is provided, or no
Content-Type
header, in which case you must fall back on reasonable defaults.Mark Rotteveel 是正确的,网络服务是罪魁祸首,它出于某种原因使用 & 发送大于和小于符号。 lt 和 & gt 格式
谢谢 Martin Algesten,但我已经说过我已经解决了它,我只是在寻找为什么会这样。
Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format
Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way.