Java BufferedReader 阿拉伯文文本文件问题
问题:java 读取的文本文件中的阿拉伯单词显示为一系列问号:??????
这是代码:
File[] fileList = mainFolder.listFiles();
BufferedReader bufferReader = null;
Reader reader = null;
try{
for(File f : fileList){
reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
bufferReader = new BufferedReader(reader);
String line = null;
while((line = bufferReader.readLine())!= null){
System.out.println(new String(line.getBytes(), "UTF-8"));
}
}
}
catch(Exception exc){
exc.printStackTrace();
}
finally {
//Close the BufferedReader
try {
if (bufferReader != null)
bufferReader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
正如你所看到的,我已经在不同的地方指定了 UTF-8 编码,但我仍然得到问号,你知道如何解决这个问题吗?
谢谢
Problem: Arabic words in my text files read by java show as series of question marks : ??????
Here is the code:
File[] fileList = mainFolder.listFiles();
BufferedReader bufferReader = null;
Reader reader = null;
try{
for(File f : fileList){
reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
bufferReader = new BufferedReader(reader);
String line = null;
while((line = bufferReader.readLine())!= null){
System.out.println(new String(line.getBytes(), "UTF-8"));
}
}
}
catch(Exception exc){
exc.printStackTrace();
}
finally {
//Close the BufferedReader
try {
if (bufferReader != null)
bufferReader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
As you can see I have specified the UTF-8 encoding in different places and still I get question marks, do you have any idea how can I fix this??
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不要尝试直接打印该行,而是打印每个字符的 Unicode 值。例如:
然后在Unicode代码图表中查找相关字符。
如果您发现它打印的是 63,那么这些实际上是问号...这表明您的文本文件一开始就不是真正的 UTF-8。
另一方面,如果对于某些字符它打印出“?”但如果值不是 63,则表明这是控制台显示问题,并且您正在正确读取数据。
Instead of trying to print out the line directly, print out the Unicode values of each character. For example:
Then look up the relevant characters in the Unicode code charts.
If you find it's printing 63, then those really are question marks... which would suggest that your text file isn't truly UTF-8 to start with.
If, on the other hand for some characters it's printing out "?" but then a value other than 63, then that would suggest it's a console display issue and you're reading the data correctly.
替换
为
不带 charset 参数的 String#getBytes()
即使用平台默认编码从字符串中获取字节,该编码本身可能不是 UTF-8。您已经通过InputStreamReader
以 UTF-8 形式读取字节,因此您无需事后来回处理它。此外,请确保您的显示控制台(您正在阅读这些行的位置)支持 UTF-8。例如,在 Eclipse 中,您可以通过 Window > 来完成此操作。首选项>一般>工作区>文本文件编码>其他> UTF-8。
另请参阅:
Replace
by
The
String#getBytes()
without the charset argument namely uses platform default encoding to get the bytes from the string, which may not be UTF-8 per se. You're already reading the bytes as UTF-8 byInputStreamReader
, so you don't need to massage it forth and back afterwards.Further, ensure that your display console (where you're reading those lines) supports UTF-8. In for example Eclipse, you can do that by Window > Preferences > General > Workspace > Text File Encoding > Other > UTF-8.
See also: