Java BufferedReader 阿拉伯文文本文件问题

发布于 2024-10-09 00:46:39 字数 1021 浏览 8 评论 0原文

问题：java 读取的文本文件中的阿拉伯单词显示为一系列问号：??????

这是代码：

        File[] fileList = mainFolder.listFiles();
        BufferedReader bufferReader = null;
        Reader reader = null;


        try{

        for(File f : fileList){           
            reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
            bufferReader = new BufferedReader(reader);
            String line = null;

            while((line = bufferReader.readLine())!= null){
               System.out.println(new String(line.getBytes(), "UTF-8"));
            }              

        }
        }
        catch(Exception exc){
            exc.printStackTrace();
        }

        finally {
            //Close the BufferedReader
            try {
                if (bufferReader != null)
                    bufferReader.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }

正如你所看到的，我已经在不同的地方指定了 UTF-8 编码，但我仍然得到问号，你知道如何解决这个问题吗？

谢谢

原文

Problem: Arabic words in my text files read by java show as series of question marks : ??????

Here is the code:

        File[] fileList = mainFolder.listFiles();
        BufferedReader bufferReader = null;
        Reader reader = null;


        try{

        for(File f : fileList){           
            reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
            bufferReader = new BufferedReader(reader);
            String line = null;

            while((line = bufferReader.readLine())!= null){
               System.out.println(new String(line.getBytes(), "UTF-8"));
            }              

        }
        }
        catch(Exception exc){
            exc.printStackTrace();
        }

        finally {
            //Close the BufferedReader
            try {
                if (bufferReader != null)
                    bufferReader.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }

As you can see I have specified the UTF-8 encoding in different places and still I get question marks, do you have any idea how can I fix this??

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

被你宠の有点坏 2024-10-16 00:46:39

不要尝试直接打印该行，而是打印每个字符的 Unicode 值。例如：

char[] chars = line.toCharArray();
for (int i = 0; i < chars.length; i++)
{
    System.out.println(i + ": " + chars[i] + " - " + (int) chars[i]);
}

然后在Unicode代码图表中查找相关字符。

如果您发现它打印的是 63，那么这些实际上是问号...这表明您的文本文件一开始就不是真正的 UTF-8。

另一方面，如果对于某些字符它打印出“？”但如果值不是 63，则表明这是控制台显示问题，并且您正在正确读取数据。

Instead of trying to print out the line directly, print out the Unicode values of each character. For example:

char[] chars = line.toCharArray();
for (int i = 0; i < chars.length; i++)
{
    System.out.println(i + ": " + chars[i] + " - " + (int) chars[i]);
}

Then look up the relevant characters in the Unicode code charts.

If you find it's printing 63, then those really are question marks... which would suggest that your text file isn't truly UTF-8 to start with.

If, on the other hand for some characters it's printing out "?" but then a value other than 63, then that would suggest it's a console display issue and you're reading the data correctly.

回复收藏 0 原文

只涨不跌 2024-10-16 00:46:39

替换

System.out.println(new String(line.getBytes(), "UTF-8"));

为

System.out.println(line);

不带 charset 参数的 String#getBytes() 即使用平台默认编码从字符串中获取字节，该编码本身可能不是 UTF-8。您已经通过 InputStreamReader 以 UTF-8 形式读取字节，因此您无需事后来回处理它。

此外，请确保您的显示控制台（您正在阅读这些行的位置）支持 UTF-8。例如，在 Eclipse 中，您可以通过 Window > 来完成此操作。首选项>一般>工作区>文本文件编码>其他> UTF-8。

另请参阅：

Unicode - 如何获取字符对吗？

Replace

System.out.println(new String(line.getBytes(), "UTF-8"));

System.out.println(line);

The String#getBytes() without the charset argument namely uses platform default encoding to get the bytes from the string, which may not be UTF-8 per se. You're already reading the bytes as UTF-8 by InputStreamReader, so you don't need to massage it forth and back afterwards.

Further, ensure that your display console (where you're reading those lines) supports UTF-8. In for example Eclipse, you can do that by Window > Preferences > General > Workspace > Text File Encoding > Other > UTF-8.