为什么 system.out.println() 在法语操作系统上返回不同的法语字符

发布于 2024-12-01 00:37:25 字数 525 浏览 0 评论 0原文

嗨，这是一个简单的问题，但我自己也不知道答案...... 在法国操作系统上运行以下代码的输出是

public class FrenchTest {
public static void main(String[] args){
    String[] lines = {"Le résultat est", "Nom de l'hôte"};

    for(String line : lines){
        System.out.println("NOW : " + line);
    }   
}
//////////////
c:\share>java FrenchTest
NOW : Le rÃ©sultat est
NOW : Nom de l'hÃ´te

c:\share>CHCP 65001

c:\share>java FrenchTest
NOW : Le rÃ©sultat est
NOW : Nom de l'hÃ´te

How come?这种情况的编码要点在哪里，它在英文版操作系统上运行良好，谢谢！

原文

Hi this is a simple question, don't know the answer myself though...
The output of following code running on a French OS is

public class FrenchTest {
public static void main(String[] args){
    String[] lines = {"Le résultat est", "Nom de l'hôte"};

    for(String line : lines){
        System.out.println("NOW : " + line);
    }   
}
//////////////
c:\share>java FrenchTest
NOW : Le rÃ©sultat est
NOW : Nom de l'hÃ´te

c:\share>CHCP 65001

c:\share>java FrenchTest
NOW : Le rÃ©sultat est
NOW : Nom de l'hÃ´te

How come? Where is the encoding gist for this case, it works fine on English version OS, THANKS!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尤怨 2024-12-08 00:37:25

如果您更改代码页，然后告诉 java 以 UTF-8 输出，它应该可以工作。请注意，您需要选择 unicode (truetype) 字体 - 我的机器上安装了 Consolas 和 Lucida Console。

请注意，如下所示，我使用 java 1.6.0_23 在我的机器上重复了最后一个字符。无法真正解释这一点:(

msandiford@foo /cygdrive/c/foo
$ javac FrenchTest.java

msandiford@foo /cygdrive/c/foo
$ java -Dfile.encoding=UTF-8 FrenchTest
NOW : Le résultat estt
NOW : Nom de l'hôtee

msandiford@foo /cygdrive/c/foo
$ java -version
java version "1.6.0_23"
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing)

If you change the code page and then tell java to output in UTF-8, it should work. Note that you will need to choose a unicode (truetype) font - I have Consolas and Lucida Console installed on my machine.

Note as below, I get the last character repeated on my machine using java 1.6.0_23. Can't really explain this :(

msandiford@foo /cygdrive/c/foo
$ javac FrenchTest.java

msandiford@foo /cygdrive/c/foo
$ java -Dfile.encoding=UTF-8 FrenchTest
NOW : Le résultat estt
NOW : Nom de l'hôtee

msandiford@foo /cygdrive/c/foo
$ java -version
java version "1.6.0_23"
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing)

回复收藏 0 原文

谷夏 2024-12-08 00:37:25

这里有两个潜在的问题：

编译时转码问题 - 编译器用于读取源文件的编码必须与编辑器用于保存源文件的编码相匹配
运行时转码问题 - 控制台用于读取数据的编码必须匹配

您可以使用 Unicode 转义来回避编译问题：

"Le r\u00E9sultat est"
"Nom de l'h\u00F4te "

默认情况下，Windows 上的数字 2 总是错误的。为了与旧的 DOS 程序兼容，cmd.exe 默认使用 OEM 系统编码。这不是仍然停留在 Unicode 之前的编码的 Windows 系统部分所使用的默认“ANSI”编码。

您可以通过切换控制台来解决此问题编码到windows-1252：

>chcp 1252

...或者通过更改用于将数据发送到控制台编码的编码。最简单的方法是使用控制台。与 System.out 不同，System.console() 检测并使用控制台编码。使用 Console 可能会导致在 IDE 中运行代码时出现问题，但有你可以做的事情。

我无法获取UTF-8与 65001 一起工作。

简而言之，您需要克服为保持向后兼容性而做出的决定。

There are two potential problems here:

Compile time transcoding problem - the encoding your compiler uses to read your source file must match the one your editor uses to save it
Runtime transcoding problem - the encoding the console uses to read the data must match the one System.out encodes it in

You can sidestep compilation issues by using Unicode escapes:

"Le r\u00E9sultat est"
"Nom de l'h\u00F4te"

By default, number 2 is always wrong on Windows. For compatibility with old DOS programs, cmd.exe uses OEM system encodings by default. This is not the default "ANSI" encoding used by the parts of the Windows system still stuck in pre-Unicode encodings.

You can fix this either by switching the console encoding to windows-1252:

>chcp 1252

...or by changing the encoding used to emit data to the console encoding. The easiest way to do this is to use Console. Unlike System.out, System.console() detects and uses the console encoding. Using Console can cause issues with running code in IDEs, but there are things you can do about that.

I have been unable to get UTF-8 to work with 65001.

In short, you need to overcome decisions made to preserve backwards compatibility.

回复收藏 0 原文