如何在Python 3.0中让print()输出UTF-8?
我正在WinXP 5.1.2600中工作,编写一个涉及中文拼音的Python应用程序,这让我陷入了无尽的Unicode问题。 切换到 Python 3.0 解决了其中的许多问题。 但是由于某些奇怪的原因,控制台输出的 print() 函数不支持 Unicode。 这是一个小程序。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
输出为(将尖括号更改为方括号以提高可读性):
sys.stdout encoding is "cp1252" Traceback (most recent call last): File "TestPrintEncoding.py", line 22, in [module] print(str1) File "C:\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 4: character maps to [undefined]
请注意,ü = '\xfc'
= 252
不会出现任何问题,因为它是高位 ASCII。 但 ā = '\u0101'
超出了 8 位。
有人知道如何将 sys.stdout
的编码更改为 'utf-8'
吗? 请记住,如果我对文档的理解正确的话,Python 3.0 不再使用 codecs
模块。
(请注意,“coding:”行指定的编码是源代码的编码,而不是控制台输出的编码。但是感谢您的想法!)
I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252" Traceback (most recent call last): File "TestPrintEncoding.py", line 22, in [module] print(str1) File "C:\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 4: character maps to [undefined]
Note that ü = '\xfc'
= 252
gives no problem since it's upper ASCII. But ā = '\u0101'
is beyond 8 bits.
Anyone have an idea how to change the encoding of sys.stdout
to 'utf-8'
? Bear in mind that Python 3.0 no longer uses the codecs
module, if I understand the documentation right.
(Note that the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在 Windows 中用 Python 显示 Unicode 字符的问题是已知的。 目前还没有官方解决方案。 正确的做法是使用 winapi 函数 WriteConsoleW。 构建一个可行的解决方案并非易事,因为还有其他相关问题。 不过,我开发了一个包来尝试修复 Python 的这个问题。 请参阅https://github.com/Drekin/win-unicode-console。 您还可以在那里阅读对该问题的更深入的解释。 该软件包也在 pypi 上 (https://pypi.python.org/pypi/win_unicode_console)并可以使用 pip 安装。
The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console. You can also read there a deeper explanation of the problem. The package is also on pypi (https://pypi.python.org/pypi/win_unicode_console) and can be installed using pip.
Windows 命令提示符 (cmd.exe) 无法显示您正在使用的 Unicode 字符,即使 Python 在内部以正确的方式处理它。 您需要使用 IDLE、Cygwin 或其他可以正确显示 Unicode 的程序。
请参阅此线程以获得完整的解释:
http://www.nabble.com /无法在Python-3-td21670662.html中打印Unicode字符
The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.
See this thread for a full explanation:
http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html
您可能想尝试将环境变量“PYTHONIOENCODING”更改为“utf_8”。 我写了一个页面来讲述我遇到的这个问题。
You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.
我认为在这里查看问题和答案他们有一些有价值的线索。 具体来说,请注意
setdefaultencoding
中的sys
模块,但事实上您可能不应该使用它。Check out the question and answer here, I think they have some valuable clues. Specifically, note the
setdefaultencoding
in thesys
module, but also the fact that you probably shouldn't use it.这是一个肮脏的黑客:
然而一切都破坏了它:
简单的静音第一行已经破坏了它:
检查操作系统类型破坏了它:
它甚至在 if 块下不起作用:
但是可以使用 cmd 的 echo: 进行打印,
这是一种跨平台的简单方法:
但是窗口的
echo
尾随空行无法被抑制。Here's a dirty hack:
However everything breaks it:
simple muting first line already breaks it:
checking for OS type breaks it:
it doesn't even works under if block:
But one can print with cmd's echo:
and here's a simple way to make this cross-platform:
but the window's
echo
trailing empty line can't be suppressed.