Android Java UTF-8 HttpClient 问题
我遇到了从网页抓取的 JSON 数组的奇怪字符编码问题。服务器正在发回此标头:
Content-Type text/javascript; charset=UTF-8
我还可以在 Firefox 或任何浏览器中查看 JSON 输出,并且 Unicode 字符显示正常。响应有时会包含来自另一种语言的带有重音符号等的单词。然而,当我将其拉下来并将其放入 Java 中的字符串时,我得到了那些奇怪的问号。这是我的代码:
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);
HttpClient httpclient = new DefaultHttpClient(params);
HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
try {
response = httpclient.execute(httpget);
if(response.getStatusLine().getStatusCode() == 200){
// Connection was established. Get the content.
HttpEntity entity = response.getEntity();
// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
// A Simple JSON Response Read
InputStream instream = entity.getContent();
String jsonText = convertStreamToString(instream);
Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();
}
}
} catch (MalformedURLException e) {
Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
} catch (IOException e) {
Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
} catch (JSONException e) {
Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
}
private static String convertStreamToString(InputStream is) {
/*
* To convert the InputStream to String we use the BufferedReader.readLine()
* method. We iterate until the BufferedReader return null which means
* there's no more data to read. Each line will appended to a StringBuilder
* and returned as String.
*/
BufferedReader reader;
try {
reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
StringBuilder sb = new StringBuilder();
String line;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
如您所见,我在 InputStreamReader 上指定 UTF-8,但每次我通过 Toast 查看返回的 JSON 文本时,它都会出现奇怪的问号。我在想我需要将 InputStream 发送到 byte[] 吗?
预先感谢您的任何帮助。
I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:
Content-Type text/javascript; charset=UTF-8
Also I can look at the JSON output in Firefox or any browser and Unicode characters display properly. The response will sometimes contain words from another language with accent symbols and such. However I am getting those weird question marks when I pull it down and put it to a string in Java. Here is my code:
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);
HttpClient httpclient = new DefaultHttpClient(params);
HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
try {
response = httpclient.execute(httpget);
if(response.getStatusLine().getStatusCode() == 200){
// Connection was established. Get the content.
HttpEntity entity = response.getEntity();
// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
// A Simple JSON Response Read
InputStream instream = entity.getContent();
String jsonText = convertStreamToString(instream);
Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();
}
}
} catch (MalformedURLException e) {
Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
} catch (IOException e) {
Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
} catch (JSONException e) {
Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
e.printStackTrace();
}
private static String convertStreamToString(InputStream is) {
/*
* To convert the InputStream to String we use the BufferedReader.readLine()
* method. We iterate until the BufferedReader return null which means
* there's no more data to read. Each line will appended to a StringBuilder
* and returned as String.
*/
BufferedReader reader;
try {
reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
StringBuilder sb = new StringBuilder();
String line;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
As you can see, I am specifying UTF-8 on the InputStreamReader but every time I view the returned JSON text via Toast it has strange question marks. I am thinking that I need to send the InputStream to a byte[] instead?
Thanks in advance for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
试试这个:
Try this:
@Arhimed 的答案就是解决方案。但我看不出您的
convertStreamToString
代码有任何明显的错误。我的猜测是:
convertStreamToString
一次读取一行字符流,并使用硬连线重新组装它'\n'
作为行尾标记。如果您要将其写入外部文件或应用程序,您可能应该使用特定于平台的行尾标记。@Arhimed's answer is the solution. But I cannot see anything obviously wrong with your
convertStreamToString
code.My guesses are:
convertStreamToString
is reading the character stream a line at a time, and reassembling it using a hard-wired'\n'
as the end-of-line marker. If you are going to write that to an external file or application, you should probably should be using a platform specific end-of-line marker.只是您的 ConvertStreamToString 不支持 HttpRespnose 中设置的编码。如果你查看 EntityUtils.toString(entity, HTTP.UTF_8) 内部,你会看到 EntityUtils 首先查找 HttpResponse 中是否有编码设置,如果有,EntityUtils 使用该编码。如果实体中没有设置编码,它只会回退到参数中传递的编码(在本例中为 HTTP.UTF_8)。
因此,您可以说您的 HTTP.UTF_8 已在参数中传递,但它从未被使用,因为它是错误的编码。因此,这里使用 EntityUtils 的辅助方法更新您的代码。
It is just that your convertStreamToString is not honoring encoding set in the HttpRespnose. If you look inside
EntityUtils.toString(entity, HTTP.UTF_8)
, you will see that EntityUtils find out if there is encoding set in the HttpResponse first, then if there is, EntityUtils use that encoding. It will only fall back to the encoding passed in the parameter(in this case HTTP.UTF_8) if there isn't encoding set in the entity.So you can say that your HTTP.UTF_8 is passed in the parameter but it never get used because it is the wrong encoding. So here is update to your code with the helper method from EntityUtils.
阿基米德的回答是正确的。但是,只需在 HTTP 请求中提供额外的标头即可完成此操作:
无需删除任何内容或使用任何其他库。
例如,
您的请求很可能没有任何
Accept-Charset
标头。Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:
No need to remove anything or use any other library.
For example,
Most probably your request doesn't have any
Accept-Charset
header.从响应内容类型字段中提取字符集。您可以使用以下方法来执行此操作:
然后使用提取的字符集创建
InputStreamReader
:Extract the charset from the response content type field. You can use the following method to do this:
Then use the extracted charset to create the
InputStreamReader
: