使用 HttpClient 3.1 设置响应编码

发布于 2024-10-19 10:55:05 字数 190 浏览 1 评论 0原文

我正在使用 org.apache.commons.httpclient.HttpClient 并需要设置响应编码(由于某种原因服务器在 Content-Type 中返回不正确的编码)。我的方法是获取原始字节的响应,并使用所需的编码转换为 String 。我想知道是否有更好的方法来做到这一点(例如设置 HttpClient)。感谢您的建议。

I'm using org.apache.commons.httpclient.HttpClient and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

清风挽心 2024-10-26 10:55:05

我认为使用 HttpClient 3.x API 没有更好的答案。

HTTP 1.1 规范明确规定客户端“必须”遵守响应标头中指定的字符集,如果未指定字符集,则使用 ISO-8859-1。 HttpClient API 的设计假设程序员希望遵守 HTTP 规范。显然,您需要打破规范中的规则,以便可以与不合规的服务器进行通信。尽管如此,这并不是 API 设计者认为需要明确支持的用例。

如果您使用的是 HttpClient 4.x,您可以编写自己的 ResponseHandler 将正文转换为 HttpEntity,忽略响应消息的概念字符集。

I don't think there's a better answer using HttpClient 3.x APIs.

The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.

If you were using the HttpClient 4.x, you could write your own ResponseHandler to convert the body into an HttpEntity, ignoring the response message's notional character set.

如梦亦如幻 2024-10-26 10:55:05

一些注意事项:

  1. 服务器提供数据,因此由服务器以适当的格式提供数据。因此响应编码是由服务器而不是客户端设置的。但是,客户端可以通过接受和接受字符集向服务器建议所需的格式:

    接受:文本/纯文本
    接受字符集:utf-8
    

    但是,http 服务器通常不会在格式之间进行转换。

  2. 如果选项 1. 不起作用,那么您应该查看服务器的配置。

  3. 当字符串作为原始字节发送时(而且总是这样,因为这是网络传输的内容),总是有定义的编码。由于服务器生成此原始字节,因此它定义了编码。因此,您不能采用原始字节并使用您选择的编码来创建字符串。您必须使用从 String 转换为字节时使用的编码。

A few notes:

  1. Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:

    Accept: text/plain
    Accept-Charset: utf-8
    

    However, http servers usually do not convert between formats.

  2. If option 1. does not work, then you should look at the configuration of the server.

  3. When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.

夏九 2024-10-26 10:55:05

免责声明:我并不真正了解 HttpClient,只是阅读 API。

我将使用返回 HttpResponse 的执行方法,然后使用 .getEntity().getContent()。这是一个纯字节流,因此如果您想忽略服务器告诉的编码,您可以简单地将您自己的 InputStreamReader 包裹在它周围。


好吧,看起来我的版本错误(显然,那里有太多的 HttpClient 类)。

但与以前相同,只是位于其他类上:HttpMethod 有一个 getResponseBodyAsStream() 方法,您现在可以围绕该方法包装自己的 InputStreamReader。 (或者一次获取整个数组,如果它不是太大,然后将其转换为字符串,如您所写。)

我认为尝试更改响应并让 HttpClient 分析它在这里不是正确的方法。


不过,我建议向服务器管理员/网站管理员发送有关错误字符集的消息。

Disclaimer: I'm not really knowing HttpClient, only reading the API.

I would use the execute method returning a HttpResponse, then .getEntity().getContent(). This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.


Okay, looks like I had the wrong version (obviously, there are too much HttpClient classes out there).

But same as before, just located on other classes: the HttpMethod has a getResponseBodyAsStream() method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)

I think trying to change the response and letting the HttpClient analyze it is not the right way here.


I suggest sending a message to the server administrator/webmaster about the wrong charset, though.

咆哮 2024-10-26 10:55:05

大家好,

万一有人在谷歌搜索中发现这篇文章设置 HttpClient 以 UTF-8 编写。

这行代码应该很方便......

response.setContentType("text/html; charset=UTF-8");

最好

Greetings folks,

Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.

This line of code should be handy...

response.setContentType("text/html; charset=UTF-8");

Best

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文