如何使 python 3 print() utf8
如何使 python 3 (3.1) print("Some text")
以 UTF-8 格式输出到标准输出,或者如何输出原始字节?
Test.py
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)
输出(在 CP1257 中,我将字符替换为字节值 [x00]
):
utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
print
太聪明了... :D 使用编码文本是没有意义的print
(因为它总是只显示字节的表示而不是真正的字节)并且根本不可能输出字节,因为无论如何打印并且总是在 sys.stdout.encoding
中对其进行编码。
例如: print(chr(255))
抛出错误:
回溯(最近一次调用最后一次): 文件“Test.py”,第 1 行,位于
中 打印(字符(255)); 文件“H:\Python31\lib\encodings\cp1257.py”,第 19 行,编码 返回 codecs.charmap_encode(输入,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' 编解码器无法对位置 0 中的字符 '\xff' 进行编码:字符映射到 ;
顺便说一句, print( TestText == TestText2.decode("utf8"))
返回 False
,尽管打印输出是相同的。
Python 3 如何确定 sys.stdout.encoding 以及如何更改它?
我制作了一个 printRAW()
函数,该函数运行良好(实际上它将输出编码为 UTF-8,所以实际上它不是原始的...):
def printRAW(*Text):
RAWOut = open(1, 'w', encoding='utf8', closefd=False)
print(*Text, file=RAWOut)
RAWOut.flush()
RAWOut.close()
printRAW("Cool", TestText)
输出(现在以 UTF-8 打印):
酷测试 - āĀēĒčČ..šŠūŪžŽ
printRAW(chr(252))
也可以很好地打印 ü
(UTF-8 格式,[xC3][xBC]
)并且没有错误:)
现在我正在寻找更好的解决方案(如果有的话)
How can I make python 3 (3.1) print("Some text")
to stdout in UTF-8, or how to output raw bytes?
Test.py
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)
Output (in CP1257 and I replaced chars to byte values [x00]
):
utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
print
is just too smart... :D There's no point using encoded text with print
(since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding
.
For example: print(chr(255))
throws an error:
Traceback (most recent call last): File "Test.py", line 1, in <module> print(chr(255)); File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>
By the way print( TestText == TestText2.decode("utf8"))
returns False
, although print output is the same.
How does Python 3 determine sys.stdout.encoding
and how can I change it?
I made a printRAW()
function which works fine (actually it encodes output to UTF-8, so really it's not raw...):
def printRAW(*Text):
RAWOut = open(1, 'w', encoding='utf8', closefd=False)
print(*Text, file=RAWOut)
RAWOut.flush()
RAWOut.close()
printRAW("Cool", TestText)
Output (now it print in UTF-8):
Cool Test - āĀēĒčČ..šŠūŪžŽ
printRAW(chr(252))
also nicely prints ü
(in UTF-8, [xC3][xBC]
) and without errors :)
Now I'm looking for maybe better solution if there's any...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我在 Python 3.6 中尝试了 zwol 的解决方案,但它对我不起作用。对于某些字符串,没有输出打印到控制台。
但是iljau的解决方案有效:使用不同的重新打开stdout编码。
I tried zwol's solution in Python 3.6, but it didn't work for me. With some strings there was no output printed to the console.
But iljau's solution worked: Reopen stdout with a different encoding.
您可以使用以下命令将控制台编码设置为 utf-8:
You can set the console encoding at utf-8 with:
澄清:
要将 UTF-8 发送到 stdout(无论控制台的编码如何),请使用其接受字节的缓冲区接口:
Clarification:
To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:
这是我能从手册中得到的最好的结果,这有点肮脏的黑客:
似乎文件对象应该有一种方法来改变它们的编码,但据我所知没有。
如果您写入 utf8stdout 然后写入 sys.stdout 而没有先调用 utf8stdout.flush() ,反之亦然,则可能会发生不好的事情。
This is the best I can dope out from the manual, and it's a bit of a dirty hack:
It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.
If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.
根据这个答案,
您可以从
python 3.7
开始手动重新配置stdout的编码As per this answer
You can manually reconfigure the encoding of stdout as of
python 3.7