从 Scala 解释器打印 Unicode
当使用 scala 解释器(即在命令行上运行命令“scala”)时,我无法正确打印 unicode 字符。当然,az、AZ 等可以正确打印,但例如 € 或 f 会打印为 ?。
print(8364.toChar)
结果是?而不是欧元。 可能我做错了什么。我的终端支持 utf-8 字符,即使我将输出通过管道传输到单独的文件并在文本编辑器中打开它,?显示。
这一切都发生在 Mac OS X(Snow Leopard,10.6.2)上,使用 Scala 2.8(每晚构建)和 Java 1.6.0_17)
When using the scala interpreter (i.e. running the command 'scala' on the commandline), I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example € or ƒ is printed as a ?.
print(8364.toChar)
results in ? instead of €.
Probably I'm doing something wrong. My terminal supports utf-8 characters and even when I pipe the output to a seperate file and open it in a texteditor, ? is displayed.
This is all happening on Mac OS X (Snow Leopard, 10.6.2) with Scala 2.8 (nightly build) and Java 1.6.0_17)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我找到了问题的原因,以及使其正常工作的解决方案。
正如我在发布我的问题并阅读 Calum 的答案以及 Mac 上的另一个项目(Java 中的)编码问题后已经怀疑的那样,问题的原因是 Mac OS X 使用的默认编码。当您开始 < code>scala 解释器,它将使用指定平台的默认编码。在 Mac OS X 上,这是 Macroman,在 Windows 上,它可能是 CP1252。您可以通过在 scala 解释器中键入以下命令来检查这一点:
根据 scala 帮助测试,可以使用 -D 选项提供 Java 属性。但是,这对我不起作用。我最终设置了环境变量
运行 scala 后,上一个命令的结果将给出以下结果:
现在,打印特殊字符按预期工作:
所以,这不是 Scala 中的错误,而是默认编码的问题。在我看来,如果默认情况下在所有平台上都使用 UTF-8 会更好。在我寻找答案(如果考虑到这一点)时,我遇到了讨论< /a> 有关此问题的 Scala 邮件列表。在第一条消息中,当
file.encoding
报告 Macroman 时,建议在 Mac OS X 上默认使用 UTF-8,因为 UTF-8 是 Mac OS X 上的默认字符集(让我想知道为什么file.encoding
默认设置为 Macroman,这可能是 10 发布之前的 Mac OS 的继承?)。我不认为这个提案会成为 Scala 2.8 的一部分,因为 Martin Odersky 写道,最好保持 Java 中的原样(即遵守file.encoding
属性)。I found the cause of the problem, and a solution to make it work as it should.
As I already suspected after posting my question and reading the answer of Calum and issues with encoding on the Mac with another project (which was in Java), the cause of the problem is the default encoding used by Mac OS X. When you start
scala
interpreter, it will use the default encoding for the specified platform. On Mac OS X, this is Macroman, on Windows it is probably CP1252. You can check this by typing the following command in the scala interpreter:According to the
scala
help test, it is possible to provide Java properties using the -D option. However, this does not work for me. I ended up setting the environment variableAfter running
scala
, the result of the previous command will give the following result:Now, printing special characters works as expected:
So, it is not a bug in Scala, but an issue with default encodings. In my opinion, it would be better if by default UTF-8 was used on all platforms. In my search for an answer if this is considered, I came across a discussion on the Scala mailing list on this issue. In the first message, it is proposes to use UTF-8 by default on Mac OS X when
file.encoding
reports Macroman, since UTF-8 is the default charset on Mac OS X (keeps me wondering whyfile.encoding
by defaults is set to Macroman, probably this is an inheritance from Mac OS before 10 was released?). I don't think this proposal will be part of Scala 2.8, since Martin Odersky wrote that it is probably best to keep things as they are in Java (i.e. honor thefile.encoding
property).好的,至少部分(如果不是全部)问题是 128 不是欧元的 Unicode 代码点。 128(或 0x80,因为十六进制似乎是标准)是
U+0080
,即它不是可打印字符,因此您的终端在打印它时遇到问题也就不足为奇了。欧元的代码点是 0x20AC(或十进制 8364),这似乎对我有用(我在 Linux 上,每晚 2.8):
另一个有趣的测试是打印 Unicode 雪人字符:
128,因为 € 显然是一个扩展来自 Windows 代码页之一的字符。
我也让你提到的另一个角色发挥作用:
Ok, at least part, if not all, of your problem here is that 128 is not the Unicode codepoint for Euro. 128 (or 0x80 since hex seems to be the norm) is
U+0080 <control>
, i.e. it is not a printable character, so it's not surprising your terminal is having trouble printing it.Euro's codepoint is 0x20AC (or in decimal 8364), and that appears to work for me (I'm on Linux, on a nightly of 2.8):
Another fun test is to print the Unicode snowman character:
128 as € is apparently an extended character from one of the Windows code pages.
I got the other character you mentioned to work too:
对于 Windows,在命令行 (cmd) 中打印:
set JAVA_OPTS="-Dfile.encoding=UTF-8"
chcp 65001
第 2 项表示 UTF-8
如果您不这样做如果想要每次打印“chcp 65001”,您可以在 Windows 注册表中更改/添加值,如下所示:
regedit
(请参阅 https://superuser.com/a/ 482117/454417)
我使用 Windows 10 和 scala 2.11.8
For Windows in command line (cmd) print:
set JAVA_OPTS="-Dfile.encoding=UTF-8"
chcp 65001
Item 2 means UTF-8
If you don't want everytime print "chcp 65001", you can change/add value in Windows Registry like this:
regedit
(see https://superuser.com/a/482117/454417)
I use Windows 10 and scala 2.11.8