特殊字符显示为问号哈希
我正在为 Android 设备开发应用程序,最近在开发时遇到了问题。
我需要从在线 html 文件中获取信息,因此我构建了 InputStream 和 BufferedReader 来实际扫描文件以获取信息。我分割了字符串以实际获取我的信息,并尝试用吐司来显示它。
一切都按我想要的方式工作正常,但每次应显示特殊字符时,都会显示问号哈希。
我认为这可能是字符集的问题,因为网站上说:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
How to I get this right?
编辑:
HttpClient httpClient = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
((AbstractHttpClient) httpClient).getCredentialsProvider().setCredentials(new AuthScope(null, -1), new UsernamePasswordCredentials("user","password"));
HttpResponse response;
response = httpClient.execute(post);
BufferedReader reader = new BufferedReader(
new InputStreamReader(
response.getEntity().getContent()
)
);
String line = null;
while ((line = reader.readLine()) != null) {
Toast.makeText(this, line, Toast.LENGTH_LONG).show();
}
I'm developing applications for android devices and had a problem while developing lately.
I needed to get information out of an html-file online, so I made a construct of InputStream and BufferedReader to actually scan the file for information. I splitted my string to actually get my information and tried displaying it with a toast.
Everything works fine and the way I want it to, but everytime a special-characters should be displayed, a questionmark-hash is.
I think it might be a problem of the charset, because the website say in the :
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
How to I get this right?
EDIT :
HttpClient httpClient = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
((AbstractHttpClient) httpClient).getCredentialsProvider().setCredentials(new AuthScope(null, -1), new UsernamePasswordCredentials("user","password"));
HttpResponse response;
response = httpClient.execute(post);
BufferedReader reader = new BufferedReader(
new InputStreamReader(
response.getEntity().getContent()
)
);
String line = null;
while ((line = reader.readLine()) != null) {
Toast.makeText(this, line, Toast.LENGTH_LONG).show();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
InputStreamReader
实际上可能会将Charset
作为第二个参数,我认为它指示它将要读取的流的字符编码。符合标准的 Java 实现不需要采用windows-1252
编码,但我相信它与ISO-8859-1
非常相似,您可以首先尝试一下解决方法看看它是否有效。InputStreamReader
类中还有另一个可能有趣的构造函数,它采用CharsetDecoder
作为第二个参数(您可以通过调用Charset.newDecoder
创建一个) ,您可以尝试使用它以您喜欢的编码或系统默认编码(可以通过调用Charset.defaultCharset
获得)来解码流。请参阅 InputStreamReader 的 JavaDoc API 文档,< a href="http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html" rel="nofollow">字符集 和 CharsetDecoder 了解详细信息。事实上,我不是专家,我对编码及其问题知之甚少,但我认为值得指出这些类的可用性。
您还可以通过调用
getEncoding
方法来检查用于InputStreamReader
的编码。InputStreamReader
may actually take aCharset
as a second parameter, to indicate, I presume, the character encoding of the stream it's going to read. Standard-compliant Java implementations are not required to feature thewindows-1252
encoding, but I believe it's quite similar toISO-8859-1
, which you can try as a first workaround to see if it works. There's also another possibly interesting constructor in theInputStreamReader
class, taking aCharsetDecoder
as a second parameter (you can create one by invokingCharset.newDecoder
), which you may try to use to decode the stream in the encoding you prefer, or perhaps in the system's default encoding, that you can obtain by invokingCharset.defaultCharset
.See the JavaDoc API documentation for InputStreamReader, Charset and CharsetDecoder for details. Indeed I'm not an expert and I know just a little about encoding and its issues, but I thought it worth to point out the availability of these classes.
You may also check the encoding used for the
InputStreamReader
by invoking itsgetEncoding
method.我的猜测是,您刚刚使用了
InputStreamReader
构造函数,它接受流而不是字符编码 - 因此它将尝试使用平台默认值。您应该使用响应中指定的编码;当您使用 HTTP 时,Content-Type 标头中的内容可能没问题,但遗憾的是 HTML 可以单独指定它:(现在 Android 是否包含 Windows-1252 编码是另一件事...
My guess is that you've just used the
InputStreamReader
constructor which takes a stream but not a character encoding - so it'll try to use the platform default. You should be using the encoding specified in the response; when you're using HTTP the one in the Content-Type header is likely to be okay, although it's a shame that the HTML can specify it separately :(Now whether Android contains the Windows-1252 encoding is a different matter...
哦,无论这个问题是否在其他地方得到解决,请使用utf-8。
http://www.w3.org/TR/html4/charset.html
http://en.wikipedia.org/wiki/UTF-8
oh, please use utf-8 regardless if this problem is solved elsewhere.
http://www.w3.org/TR/html4/charset.html
http://en.wikipedia.org/wiki/UTF-8
以防万一其他人也遇到与我相同的问题...
我从从 res/raw 加载的 JSON 文件中提取的文本得到了相同的问号-in-a-black-diamond。无论我尝试哪种流阅读组合,字符仍然会出现。我第一次尝试确保使用 UTF-8 是通过 Eclipse 检查文件属性,果然它被设置为“MacRoman”,无论它是什么。我将其更改为UTF-8,构建,运行,失败,清理,构建,运行,失败,抓破头,回到SO。
我读到我必须在更改编码后保存文件,所以我尝试了,但仍然没有成功。然后,我最终在 Eclipse 编辑器中的 JSON 文件中向下滚动到特殊字符所在的位置,有趣的是,特殊字符(é 和破折号)也显示为黑色菱形!我删除并重新输入它们,一切正常。
底线:编码很重要,在创建资源文件(XML、JSON、CSV 或其他文件)时,请确保在开始输入文本之前选择正确的编码(通常是 UTF-8)。
Just in case someone else has the same problem I had...
I was getting the same question mark-in-a-black-diamond for text I pulled from a JSON file I'd loaded from res/raw. No matter what sort of stream reading combination I tried, the characters would still appear. My first attempt to ensure I was using UTF-8 was to check the file properties via Eclipse, and sure enough it was set to "MacRoman", whatever that is. I changed it to UTF-8, built, ran, failed, cleaned, built, run, failed, scratched head, came back to SO.
I read that I had to save the file after changing the encoding so I tried that and still no luck. I then finally scrolled down in the JSON file in the Eclipse editor to where the special characters were and interestingly the special characters (é and an emdash) were showing as black diamonds there as well! I deleted and re-entered them and everything worked fine.
Bottom line: encoding matters, and when creating a resource file (XML, JSON, CSV or whatever) make sure you select the proper encoding (usually UTF-8) BEFORE you start entering text.