URLConnection 未获取字符集

发布于 2024-09-27 15:02:58 字数 217 浏览 8 评论 0原文

我正在使用 URL.openConnection() 从服务器下载某些内容。服务器显示

Content-Type: text/plain; charset=utf-8

But connection.getContentEncoding() returns null。怎么了?

I'm using URL.openConnection() to download something from a server. The server says

Content-Type: text/plain; charset=utf-8

But connection.getContentEncoding() returns null. What up?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

段念尘 2024-10-04 15:02:58

URLConnection.getContentEncoding() 返回的值返回来自 Content-Encoding 标头的值

URLConnection.getContentEncoding()

/**
     * Returns the value of the <code>content-encoding</code> header field.
     *
     * @return  the content encoding of the resource that the URL references,
     *          or <code>null</code> if not known.
     * @see     java.net.URLConnection#getHeaderField(java.lang.String)
     */
    public String getContentEncoding() {
       return getHeaderField("content-encoding");
    }

相反,而是执行connection.getContentType() 用于检索 Content-Type 并从 Content-Type 中检索字符集。我已经包含了有关如何执行此操作的示例代码......

String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";

for (String value : values) {
    value = value.trim();

    if (value.toLowerCase().startsWith("charset=")) {
        charset = value.substring("charset=".length());
    }
}

if ("".equals(charset)) {
    charset = "UTF-8"; //Assumption
}

The value returned from URLConnection.getContentEncoding() returns the value from header Content-Encoding

Code from URLConnection.getContentEncoding()

/**
     * Returns the value of the <code>content-encoding</code> header field.
     *
     * @return  the content encoding of the resource that the URL references,
     *          or <code>null</code> if not known.
     * @see     java.net.URLConnection#getHeaderField(java.lang.String)
     */
    public String getContentEncoding() {
       return getHeaderField("content-encoding");
    }

Instead, rather do a connection.getContentType() to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....

String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";

for (String value : values) {
    value = value.trim();

    if (value.toLowerCase().startsWith("charset=")) {
        charset = value.substring("charset=".length());
    }
}

if ("".equals(charset)) {
    charset = "UTF-8"; //Assumption
}
金兰素衣 2024-10-04 15:02:58

这是记录的行为,因为指定 getContentEncoding() 方法返回 Content-Encoding HTTP 标头的内容,而您的示例中未设置该标头。您可以使用 getContentType() 方法并自行解析结果字符串,或者可能寻求更多 高级 HTTP 客户端库,例如 Apache

This is documented behaviour as the getContentEncoding() method is specified to return the contents of the Content-Encoding HTTP header, which is not set in your example. You could use the getContentType() method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache.

七颜 2024-10-04 15:02:58

正如@Buhake Sindi 的答案的补充。如果您使用 Guava,您可以执行以下操作而不是手动解析:

MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();

Just as an addition to the answer from @Buhake Sindi. If you are using Guava, instead of the manual parsing you can do:

MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文