Android下载utf-8网页的字符集问题

发布于 2024-12-02 05:35:15 字数 1130 浏览 0 评论 0原文

我在下载和解析 UTF-8 网页时遇到问题...我使用下一个函数来获取网络的 HTML:

static String getString(String url, ProgressDialog loading) {
    String s = "", html = "";
    HttpURLConnection conn = null;
    try {
        conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
        conn.setConnectTimeout(5000);
        conn.setReadTimeout(5000);
        conn.connect();
        DataInputStream dis = new DataInputStream(conn.getInputStream());
        loading.setTitle("Descargando...");
        loading.setMax( 32000 );
        while ((s = dis.readLine()) != null) {
            html += s;
            loading.setProgress(html.length());
        }
    } catch (Exception e) {
        Log.e("CC", "Error al descargar: " + e.getMessage());

    } finally {
        if (conn != null)
            conn.disconnect();
    }
    return html;
}

网页有:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

但西班牙语的元素如:¡ ¿ á é í ó ú 在应用程序中似乎是错误的。我尝试使用 readUTF() 但我有长度问题...

有什么想法吗?谢谢你!

I have a problem downloading and parsing a UTF-8 webpage... I use the next function to get the web's HTML:

static String getString(String url, ProgressDialog loading) {
    String s = "", html = "";
    HttpURLConnection conn = null;
    try {
        conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
        conn.setConnectTimeout(5000);
        conn.setReadTimeout(5000);
        conn.connect();
        DataInputStream dis = new DataInputStream(conn.getInputStream());
        loading.setTitle("Descargando...");
        loading.setMax( 32000 );
        while ((s = dis.readLine()) != null) {
            html += s;
            loading.setProgress(html.length());
        }
    } catch (Exception e) {
        Log.e("CC", "Error al descargar: " + e.getMessage());

    } finally {
        if (conn != null)
            conn.disconnect();
    }
    return html;
}

And the web page has:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

But the Spanish's elements like: ¡ ¿ á é í ó ú apears wrong in the app. I tried to use readUTF() but I have length problems...

Some ideas? Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

飘过的浮云 2024-12-09 05:35:15

您需要使用 Reader 来指定用于读取输入流的字符集。在这种特殊情况下,您需要一个 InputStreamReader

Reader reader = null;
StringBuilder builder = new StringBuilder();

try {
    // ...
    reader = new InputStreamReader(connection.getInputStream(), "UTF-8");
    char[] buffer = new char[8192];

    for (int length = 0; (length = reader.read(buffer)) > 0;) {
        builder.append(buffer, 0, length);
        loading.setProgress(length);
    }
} finally {
    if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}

String html = builder.toString();
// ...

与具体问题无关,您是否考虑使用像 Jsoup 这样的 HTML 解析器?它会考虑到这些令人讨厌的细节。它就像

String html = Jsoup.connect(url).get().html();
// ...

It 一样简单,但实际上并不允许附加进度监视器。

You need to use a Reader where you specify the charset used to read the input stream. In this particular case you need an InputStreamReader.

Reader reader = null;
StringBuilder builder = new StringBuilder();

try {
    // ...
    reader = new InputStreamReader(connection.getInputStream(), "UTF-8");
    char[] buffer = new char[8192];

    for (int length = 0; (length = reader.read(buffer)) > 0;) {
        builder.append(buffer, 0, length);
        loading.setProgress(length);
    }
} finally {
    if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}

String html = builder.toString();
// ...

Unrelated to the concrete problem, did you consider using a HTML parser like Jsoup? It'll take this nasty details into account. It's then as simple as

String html = Jsoup.connect(url).get().html();
// ...

It however doesn't really allow for attaching a progress monitor.

我很坚强 2024-12-09 05:35:15

我很确定您不想使用 DataInputStream。

这个答案可能会有所帮助: 读取/将输入流转换为字符串

I'm pretty sure you don't want to use a DataInputStream.

This answer might be helpful though: Read/convert an InputStream to a String

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文