Android下载utf-8网页的字符集问题
我在下载和解析 UTF-8 网页时遇到问题...我使用下一个函数来获取网络的 HTML:
static String getString(String url, ProgressDialog loading) {
String s = "", html = "";
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
conn.setConnectTimeout(5000);
conn.setReadTimeout(5000);
conn.connect();
DataInputStream dis = new DataInputStream(conn.getInputStream());
loading.setTitle("Descargando...");
loading.setMax( 32000 );
while ((s = dis.readLine()) != null) {
html += s;
loading.setProgress(html.length());
}
} catch (Exception e) {
Log.e("CC", "Error al descargar: " + e.getMessage());
} finally {
if (conn != null)
conn.disconnect();
}
return html;
}
网页有:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
但西班牙语的元素如:¡ ¿ á é í ó ú 在应用程序中似乎是错误的。我尝试使用 readUTF() 但我有长度问题...
有什么想法吗?谢谢你!
I have a problem downloading and parsing a UTF-8 webpage... I use the next function to get the web's HTML:
static String getString(String url, ProgressDialog loading) {
String s = "", html = "";
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
conn.setConnectTimeout(5000);
conn.setReadTimeout(5000);
conn.connect();
DataInputStream dis = new DataInputStream(conn.getInputStream());
loading.setTitle("Descargando...");
loading.setMax( 32000 );
while ((s = dis.readLine()) != null) {
html += s;
loading.setProgress(html.length());
}
} catch (Exception e) {
Log.e("CC", "Error al descargar: " + e.getMessage());
} finally {
if (conn != null)
conn.disconnect();
}
return html;
}
And the web page has:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
But the Spanish's elements like: ¡ ¿ á é í ó ú apears wrong in the app. I tried to use readUTF() but I have length problems...
Some ideas? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要使用 Reader 来指定用于读取输入流的字符集。在这种特殊情况下,您需要一个
InputStreamReader
。
与具体问题无关,您是否考虑使用像 Jsoup 这样的 HTML 解析器?它会考虑到这些令人讨厌的细节。它就像
It 一样简单,但实际上并不允许附加进度监视器。
You need to use a
Reader
where you specify the charset used to read the input stream. In this particular case you need anInputStreamReader
.Unrelated to the concrete problem, did you consider using a HTML parser like Jsoup? It'll take this nasty details into account. It's then as simple as
It however doesn't really allow for attaching a progress monitor.
我很确定您不想使用 DataInputStream。
这个答案可能会有所帮助: 读取/将输入流转换为字符串
I'm pretty sure you don't want to use a DataInputStream.
This answer might be helpful though: Read/convert an InputStream to a String