Java HttpURLConnection 的编码错误

发布于 2024-12-11 13:31:57 字数 1000 浏览 0 评论 0原文

尝试从 MS Web 服务读取生成的 XML

URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
  line = buff.readLine();
  text.append(line + "\n");
} while (line != null);
box.setText(text.toString());

URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {

    inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
  System.out.println(inputLine);
}
in.close();

除 Web 服务输出外的任何页面都可以正常读取 它读取大于和小于符号,奇怪的是

它读取 <到“<”和>到“>”不带空格,但如果我在这里输入不带空格的 stackoverflow 会使它们 <和>

请帮忙 谢谢

Trying to read a generated XML from a MS Webservice

URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
  line = buff.readLine();
  text.append(line + "\n");
} while (line != null);
box.setText(text.toString());

or

URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {

    inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
  System.out.println(inputLine);
}
in.close();

Any page reads fine except the web service output
it reads the greater and less than signs strangely

it read < to "& lt;" and > to "& gt;" without spaces, but if i type them here without spaces stackoverflow makes them < and >

Please help
thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傻比既视感 2024-12-18 13:31:57

首先,这一行似乎存在混淆:

inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");

这实际上表明您希望服务器提供的文档中的每一行都经过 URL 编码。 URL 编码与文档编码不同。

http://en.wikipedia.org/wiki/Percent-encoding

http://en.wikipedia.org/wiki/Character_encoding

看看你的代码片段,我认为 URL 编码(百分比编码)不是你所追求的。

文档字符编码而言。您正在这一行进行转换:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());

conn.getContent() 返回一个对字节进行操作的 InputStream,而读取器对字符进行操作 - 字符编码转换在这里完成。查看 InputStreamReader 的其他构造函数,它将编码作为第二个参数。如果没有第二个参数,您将依赖于 java 中的平台默认值。

InputStreamReader(InputStream in, String charsetName)

例如,您可以将代码更改为:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");

但真正的问题是“您的服务器提供内容的编码是什么?”如果您也拥有服务器代码,则可以将其硬编码为合理的内容,例如 utf-8。但如果它可能有所不同,您需要查看 http 标头 Content-Type 来弄清楚。

String contentType = conn.getHeaderField("Content-Type");

contentType 的内容看起来像

text/plain; charset=utf-8

获取此字段的简写方法是:

String contentEncoding = conn.getContentEncoding();

请注意,完全有可能没有提供字符集,或者没有 Content-Type 标头,在这种情况下,您必须求助于合理的默认值。

First there seem to be a confusion on this row:

inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");

This effectively says that you expect every row in the document that your server is providing to be URL encoded. URL encoding is not the same as document encoding.

http://en.wikipedia.org/wiki/Percent-encoding

http://en.wikipedia.org/wiki/Character_encoding

Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after.

In terms of document character encoding. You are making a conversion on this line:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());

conn.getContent() returns an InputStream that operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. Checkout the other constructors of InputStreamReader which takes the encoding as second argument. Without the second argument you are falling back on whatever is your platform default in java.

InputStreamReader(InputStream in, String charsetName)

for instance lets you change your code to:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");

But the real question will be "what encoding is your server providing the content in?" If you own the server code too, you may just hard code it to something reasonable such as utf-8. But if it can vary, you need to look at the http header Content-Type to figure it out.

String contentType = conn.getHeaderField("Content-Type");

The contents of contentType will look like

text/plain; charset=utf-8

A short hand way of getting this field is:

String contentEncoding = conn.getContentEncoding();

Notice that it's entirely possible that no charset is provided, or no Content-Type header, in which case you must fall back on reasonable defaults.

擦肩而过的背影 2024-12-18 13:31:57

Mark Rotteveel 是正确的,网络服务是罪魁祸首,它出于某种原因使用 & 发送大于和小于符号。 lt 和 & gt 格式

谢谢 Martin Algesten,但我已经说过我已经解决了它,我只是在寻找为什么会这样。

Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format

Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文