Java 中不显示 UTF-8 CJK 字符

发布于 2024-11-06 07:11:55 字数 682 浏览 4 评论 0 原文

我已经阅读 Unicode 和 UTF-8 编码有一段时间了,我想我理解它,所以希望这不会是一个愚蠢的问题:

我有一个包含一些 CJK 字符的文件,并且已保存为UTF-8。我安装了各种亚洲语言包,并且其他应用程序可以正确呈现字符,所以我知道这很有效。

在我的 Java 应用程序中,我按如下方式读取该文件:

// Create objects
fis = new FileInputStream(new File("xyz.sgf"));
InputStreamReader is = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(is);

// Read and display file contents
StringBuffer sb = new StringBuffer();
String line;
while ((line = br.readLine()) != null) {
    sb.append(line);
}
System.out.println(sb);

输出将 CJK 字符显示为“???”。调用 is.getEncoding() 确认它确实使用 UTF-8。为了使角色正确显示,我缺少哪一步?如果有影响,我会使用 Eclipse 控制台查看输出。

I've been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won't be a stupid question:

I have a file which contains some CJK characters, and which has been saved as UTF-8. I have various Asian language packs installed and the characters are rendered properly by other applications, so I know that much works.

In my Java app, I read the file as follows:

// Create objects
fis = new FileInputStream(new File("xyz.sgf"));
InputStreamReader is = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(is);

// Read and display file contents
StringBuffer sb = new StringBuffer();
String line;
while ((line = br.readLine()) != null) {
    sb.append(line);
}
System.out.println(sb);

The output shows the CJK characters as '???'. A call to is.getEncoding() confirms that it is definitely using UTF-8. What step am I missing to make the characters appear properly? If it makes a difference, I'm looking at the output using the Eclipse console.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

乜一 2024-11-13 07:11:55
System.out.println(sb);

问题出在上面这一行。这将使用默认系统编码对字符数据进行编码并将数据发送到 STDOUT。在许多系统上,这是一个有损过程。

如果更改默认值,System.out 使用的编码和控制台使用的编码必须匹配。

唯一受支持的更改默认系统编码的机制是通过操作系统。 (有些人会建议使用 file.encoding 系统属性,但这是 不支持并且可能会产生意想不到的副作用。)您可以使用setOut 到您自己的自定义 PrintStream

PrintStream stdout = new PrintStream(System.out, autoFlush, encoding);

您可以通过 运行配置

您可以通过我的个人资料在我的博客上找到许多有关该主题的帖子。

System.out.println(sb);

The problem is the above line. This will encode character data using the default system encoding and emit the data to STDOUT. On many systems, this is a lossy process.

If you change the defaults, the encoding used by System.out and the encoding used by the console must match.

The only supported mechanism to change the default system encoding is via the operating system. (Some will advise using the file.encoding system property, but this is not supported and may have unintended side-effects.) You can use setOut to your own custom PrintStream:

PrintStream stdout = new PrintStream(System.out, autoFlush, encoding);

You can change the Eclipse console encoding via the Run configuration.

You can find a number of posts about the subject on my blog - via my profile.

我纯我任性 2024-11-13 07:11:55

以下程序使用 TextPad 将 CJK 字符打印到控制台。要查看韩文朝鲜文和日文平假名,我必须告诉 Java 将打印流的编码更改为 EUC_KR 并设置 TextPad 工具输出窗口的属性:

  • 字体是 Arial Unicode MS
  • 脚本是朝鲜文

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

class Hangul {

    public static void main(String[] args)  throws Exception {

        // Change console encoding to Korean

        PrintStream out = new PrintStream(System.out, true, "EUC_KR");
        System.setOut(out);

        // Print sample to console

        String go_hello  = "가다 こんにちは";
        System.out.println(go_hello);
    }
}

工具输出是:

і다 こんにちは

The following program prints CJK characters to the console using TextPad. To see the Korean Hangul and Japanese Hiragana I had to tell Java to change the print stream's encoding to EUC_KR and set the properties of TextPad's tool output window:

  • font is Arial Unicode MS
  • script is Hangul

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

class Hangul {

    public static void main(String[] args)  throws Exception {

        // Change console encoding to Korean

        PrintStream out = new PrintStream(System.out, true, "EUC_KR");
        System.setOut(out);

        // Print sample to console

        String go_hello  = "가다 こんにちは";
        System.out.println(go_hello);
    }
}

Tool Output is:

가다 こんにちは

春风十里 2024-11-13 07:11:55

是的,您需要按照 如何在 eclipse-console 中显示中文字符 文章

Yeah, you need to change the encoding of the Eclipse console as explained in this how-to-display-chinese-character-in-eclipse-console article

说好的呢 2024-11-13 07:11:55

根据您的平台,您的控制台(或 Windows CMD)很可能不支持或不使用 UTF-8 字符集,因此会将所有不可映射的字符转换为问号。

例如,在 Windows 上,CMD 几乎总是使用 WIN1252 或类似的单字节字符集。

Depending on your platform, it is highly likely that your console (or windows CMD) does not support or use the UTF-8 characterset, and therefor converts all unmappable characters to a question mark.

On Windows for example CMD almost always uses WIN1252 or a similar single byte characterset.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文