Android Java UTF-8 HttpClient 问题

发布于 2024-10-08 18:06:56 字数 2991 浏览 4 评论 0原文

我遇到了从网页抓取的 JSON 数组的奇怪字符编码问题。服务器正在发回此标头：

Content-Type text/javascript； charset=UTF-8

我还可以在 Firefox 或任何浏览器中查看 JSON 输出，并且 Unicode 字符显示正常。响应有时会包含来自另一种语言的带有重音符号等的单词。然而，当我将其拉下来并将其放入 Java 中的字符串时，我得到了那些奇怪的问号。这是我的代码：

HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);

HttpClient httpclient = new DefaultHttpClient(params);

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
    try {
        response = httpclient.execute(httpget);

        if(response.getStatusLine().getStatusCode() == 200){
            // Connection was established. Get the content. 

            HttpEntity entity = response.getEntity();
            // If the response does not enclose an entity, there is no need
            // to worry about connection release

            if (entity != null) {
                // A Simple JSON Response Read
                InputStream instream = entity.getContent();
                String jsonText = convertStreamToString(instream);

                Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();

            }

        }


    } catch (MalformedURLException e) {
        Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (IOException e) {
        Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (JSONException e) {
        Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    }

private static String convertStreamToString(InputStream is) {
    /*
     * To convert the InputStream to String we use the BufferedReader.readLine()
     * method. We iterate until the BufferedReader return null which means
     * there's no more data to read. Each line will appended to a StringBuilder
     * and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

如您所见，我在 InputStreamReader 上指定 UTF-8，但每次我通过 Toast 查看返回的 JSON 文本时，它都会出现奇怪的问号。我在想我需要将 InputStream 发送到 byte[] 吗？

预先感谢您的任何帮助。

原文

I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:

Content-Type text/javascript; charset=UTF-8

Also I can look at the JSON output in Firefox or any browser and Unicode characters display properly. The response will sometimes contain words from another language with accent symbols and such. However I am getting those weird question marks when I pull it down and put it to a string in Java. Here is my code:

HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);

HttpClient httpclient = new DefaultHttpClient(params);

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
    try {
        response = httpclient.execute(httpget);

        if(response.getStatusLine().getStatusCode() == 200){
            // Connection was established. Get the content. 

            HttpEntity entity = response.getEntity();
            // If the response does not enclose an entity, there is no need
            // to worry about connection release

            if (entity != null) {
                // A Simple JSON Response Read
                InputStream instream = entity.getContent();
                String jsonText = convertStreamToString(instream);

                Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();

            }

        }


    } catch (MalformedURLException e) {
        Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (IOException e) {
        Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (JSONException e) {
        Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    }

private static String convertStreamToString(InputStream is) {
    /*
     * To convert the InputStream to String we use the BufferedReader.readLine()
     * method. We iterate until the BufferedReader return null which means
     * there's no more data to read. Each line will appended to a StringBuilder
     * and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

As you can see, I am specifying UTF-8 on the InputStreamReader but every time I view the returned JSON text via Toast it has strange question marks. I am thinking that I need to send the InputStream to a byte[] instead?

Thanks in advance for any help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只怪假的太真实 2024-10-15 18:06:56

试试这个：

if (entity != null) {
    // A Simple JSON Response Read
    // InputStream instream = entity.getContent();
    // String jsonText = convertStreamToString(instream);

    String jsonText = EntityUtils.toString(entity, HTTP.UTF_8);

    // ... toast code here
}

Try this:

if (entity != null) {
    // A Simple JSON Response Read
    // InputStream instream = entity.getContent();
    // String jsonText = convertStreamToString(instream);

    String jsonText = EntityUtils.toString(entity, HTTP.UTF_8);

    // ... toast code here
}

回复收藏 0 原文

停滞 2024-10-15 18:06:56

@Arhimed 的答案就是解决方案。但我看不出您的 convertStreamToString 代码有任何明显的错误。

我的猜测是：

服务器在流的开头放置了一个 UTF 字节顺序标记 (BOM)。标准 Java UTF-8 字符解码器不会删除 BOM，因此它很可能会出现在结果字符串中。（但是，EntityUtils 的代码似乎也不对 BOM 执行任何操作。）
您的 convertStreamToString 一次读取一行字符流，并使用硬连线 重新组装它'\n' 作为行尾标记。如果您要将其写入外部文件或应用程序，您可能应该使用特定于平台的行尾标记。

回复收藏 0 原文

指尖凝香 2024-10-15 18:06:56

只是您的 ConvertStreamToString 不支持 HttpRespnose 中设置的编码。如果你查看 EntityUtils.toString(entity, HTTP.UTF_8) 内部，你会看到 EntityUtils 首先查找 HttpResponse 中是否有编码设置，如果有，EntityUtils 使用该编码。如果实体中没有设置编码，它只会回退到参数中传递的编码（在本例中为 HTTP.UTF_8）。

因此，您可以说您的 HTTP.UTF_8 已在参数中传递，但它从未被使用，因为它是错误的编码。因此，这里使用 EntityUtils 的辅助方法更新您的代码。

           HttpEntity entity = response.getEntity();
           String charset = getContentCharSet(entity);
           InputStream instream = entity.getContent();
           String jsonText = convertStreamToString(instream,charset);

    private static String getContentCharSet(final HttpEntity entity) throws ParseException {
    if (entity == null) {
        throw new IllegalArgumentException("HTTP entity may not be null");
    }
    String charset = null;
    if (entity.getContentType() != null) {
        HeaderElement values[] = entity.getContentType().getElements();
        if (values.length > 0) {
            NameValuePair param = values[0].getParameterByName("charset");
            if (param != null) {
                charset = param.getValue();
            }
        }
    }
    return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset;
}



private static String convertStreamToString(InputStream is, String encoding) {
    /*
     * To convert the InputStream to String we use the
     * BufferedReader.readLine() method. We iterate until the BufferedReader
     * return null which means there's no more data to read. Each line will
     * appended to a StringBuilder and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, encoding));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

It is just that your convertStreamToString is not honoring encoding set in the HttpRespnose. If you look inside EntityUtils.toString(entity, HTTP.UTF_8), you will see that EntityUtils find out if there is encoding set in the HttpResponse first, then if there is, EntityUtils use that encoding. It will only fall back to the encoding passed in the parameter(in this case HTTP.UTF_8) if there isn't encoding set in the entity.

So you can say that your HTTP.UTF_8 is passed in the parameter but it never get used because it is the wrong encoding. So here is update to your code with the helper method from EntityUtils.

           HttpEntity entity = response.getEntity();
           String charset = getContentCharSet(entity);
           InputStream instream = entity.getContent();
           String jsonText = convertStreamToString(instream,charset);

    private static String getContentCharSet(final HttpEntity entity) throws ParseException {
    if (entity == null) {
        throw new IllegalArgumentException("HTTP entity may not be null");
    }
    String charset = null;
    if (entity.getContentType() != null) {
        HeaderElement values[] = entity.getContentType().getElements();
        if (values.length > 0) {
            NameValuePair param = values[0].getParameterByName("charset");
            if (param != null) {
                charset = param.getValue();
            }
        }
    }
    return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset;
}



private static String convertStreamToString(InputStream is, String encoding) {
    /*
     * To convert the InputStream to String we use the
     * BufferedReader.readLine() method. We iterate until the BufferedReader
     * return null which means there's no more data to read. Each line will
     * appended to a StringBuilder and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, encoding));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

回复收藏 0 原文

究竟谁懂我的在乎 2024-10-15 18:06:56

阿基米德的回答是正确的。但是，只需在 HTTP 请求中提供额外的标头即可完成此操作：

Accept-charset: utf-8

无需删除任何内容或使用任何其他库。

例如，

GET / HTTP/1.1
Host: www.website.com
Connection: close
Accept: text/html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: utf-8

您的请求很可能没有任何 Accept-Charset 标头。

Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:

Accept-charset: utf-8

No need to remove anything or use any other library.

For example,

GET / HTTP/1.1
Host: www.website.com
Connection: close
Accept: text/html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: utf-8

Most probably your request doesn't have any Accept-Charset header.

回复收藏 0 原文

指尖凝香 2024-10-15 18:06:56

从响应内容类型字段中提取字符集。您可以使用以下方法来执行此操作：

private static String extractCharsetFromContentType(String contentType) {
    if (TextUtils.isEmpty(contentType)) return null;

    Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)");
    Matcher m = p.matcher(contentType);

    if (m.find()) {
        try {
            return m.group(1);
        } catch (Exception e) {
            return null;
        }
    }

    return null;
}

然后使用提取的字符集创建 InputStreamReader：

String charsetName = extractCharsetFromContentType(connection.getContentType());

InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) :
                    new InputStreamReader(inputStream, charsetName));
            BufferedReader reader = new BufferedReader(inReader);

Extract the charset from the response content type field. You can use the following method to do this:

private static String extractCharsetFromContentType(String contentType) {
    if (TextUtils.isEmpty(contentType)) return null;

    Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)");
    Matcher m = p.matcher(contentType);

    if (m.find()) {
        try {
            return m.group(1);
        } catch (Exception e) {
            return null;
        }
    }

    return null;
}

Then use the extracted charset to create the InputStreamReader:

String charsetName = extractCharsetFromContentType(connection.getContentType());

InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) :
                    new InputStreamReader(inputStream, charsetName));
            BufferedReader reader = new BufferedReader(inReader);

回复收藏 0 原文

~没有更多了~