使用 HttpClient 3.1 设置响应编码
我正在使用 org.apache.commons.httpclient.HttpClient 并需要设置响应编码(由于某种原因服务器在 Content-Type 中返回不正确的编码)。我的方法是获取原始字节的响应,并使用所需的编码转换为 String
。我想知道是否有更好的方法来做到这一点(例如设置 HttpClient)。感谢您的建议。
I'm using org.apache.commons.httpclient.HttpClient
and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String
with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为使用
HttpClient
3.x API 没有更好的答案。HTTP 1.1 规范明确规定客户端“必须”遵守响应标头中指定的字符集,如果未指定字符集,则使用 ISO-8859-1。
HttpClient
API 的设计假设程序员希望遵守 HTTP 规范。显然,您需要打破规范中的规则,以便可以与不合规的服务器进行通信。尽管如此,这并不是 API 设计者认为需要明确支持的用例。如果您使用的是
HttpClient
4.x,您可以编写自己的ResponseHandler
将正文转换为HttpEntity
,忽略响应消息的概念字符集。I don't think there's a better answer using
HttpClient
3.x APIs.The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The
HttpClient
APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.If you were using the
HttpClient
4.x, you could write your ownResponseHandler
to convert the body into anHttpEntity
, ignoring the response message's notional character set.一些注意事项:
服务器提供数据,因此由服务器以适当的格式提供数据。因此响应编码是由服务器而不是客户端设置的。但是,客户端可以通过接受和接受字符集向服务器建议所需的格式:
但是,http 服务器通常不会在格式之间进行转换。
如果选项 1. 不起作用,那么您应该查看服务器的配置。
当字符串作为原始字节发送时(而且总是这样,因为这是网络传输的内容),总是有定义的编码。由于服务器生成此原始字节,因此它定义了编码。因此,您不能采用原始字节并使用您选择的编码来创建字符串。您必须使用从 String 转换为字节时使用的编码。
A few notes:
Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:
However, http servers usually do not convert between formats.
If option 1. does not work, then you should look at the configuration of the server.
When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.
免责声明:我并不真正了解 HttpClient,只是阅读 API。
我将使用返回 HttpResponse 的执行方法,然后使用
.getEntity().getContent()
。这是一个纯字节流,因此如果您想忽略服务器告诉的编码,您可以简单地将您自己的 InputStreamReader 包裹在它周围。好吧,看起来我的版本错误(显然,那里有太多的 HttpClient 类)。
但与以前相同,只是位于其他类上:
HttpMethod
有一个getResponseBodyAsStream()
方法,您现在可以围绕该方法包装自己的 InputStreamReader。 (或者一次获取整个数组,如果它不是太大,然后将其转换为字符串,如您所写。)我认为尝试更改响应并让 HttpClient 分析它在这里不是正确的方法。
不过,我建议向服务器管理员/网站管理员发送有关错误字符集的消息。
Disclaimer: I'm not really knowing HttpClient, only reading the API.
I would use the execute method returning a HttpResponse, then
.getEntity().getContent()
. This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.Okay, looks like I had the wrong version (obviously, there are too much
HttpClient
classes out there).But same as before, just located on other classes: the
HttpMethod
has agetResponseBodyAsStream()
method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)I think trying to change the response and letting the HttpClient analyze it is not the right way here.
I suggest sending a message to the server administrator/webmaster about the wrong charset, though.
大家好,
万一有人在谷歌搜索中发现这篇文章设置 HttpClient 以 UTF-8 编写。
这行代码应该很方便......
最好
Greetings folks,
Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.
This line of code should be handy...
Best