如何使 python 3 print() utf8

发布于 2024-09-16 04:33:01 字数 2244 浏览 5 评论 0原文

如何使 python 3 (3.1) print("Some text") 以 UTF-8 格式输出到标准输出,或者如何输出原始字节?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

输出(在 CP1257 中,我将字符替换为字节值 [x00]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print 太聪明了... :D 使用编码文本是没有意义的print (因为它总是只显示字节的表示而不是真正的字节)并且根本不可能输出字节,因为无论如何打印并且总是在 sys.stdout.encoding 中对其进行编码。

例如: print(chr(255)) 抛出错误:

回溯(最近一次调用最后一次):
  文件“Test.py”,第 1 行,位于  中
    打印(字符(255));
  文件“H:\Python31\lib\encodings\cp1257.py”,第 19 行,编码
    返回 codecs.charmap_encode(输入,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' 编解码器无法对位置 0 中的字符 '\xff' 进行编码:字符映射到 ;

顺便说一句, print( TestText == TestText2.decode("utf8")) 返回 False,尽管打印输出是相同的。


Python 3 如何确定 sys.stdout.encoding 以及如何更改它?

我制作了一个 printRAW() 函数,该函数运行良好(实际上它将输出编码为 UTF-8,所以实际上它不是原始的...):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

输出(现在以 UTF-8 打印):

酷测试 - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) 也可以很好地打印 ü (UTF-8 格式,[xC3][xBC])并且没有错误:)

现在我正在寻找更好的解决方案(如果有的话)

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last):
  File "Test.py", line 1, in <module>
    print(chr(255));
  File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.


How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):

Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

阳光下的泡沫是彩色的 2024-09-23 04:33:02

我在 Python 3.6 中尝试了 zwol 的解决方案,但它对我不起作用。对于某些字符串,没有输出打印到控制台。

但是iljau的解决方案有效:使用不同的重新打开stdout编码。

import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)

I tried zwol's solution in Python 3.6, but it didn't work for me. With some strings there was no output printed to the console.

But iljau's solution worked: Reopen stdout with a different encoding.

import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)
神妖 2024-09-23 04:33:02

您可以使用以下命令将控制台编码设置为 utf-8:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

You can set the console encoding at utf-8 with:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
百变从容 2024-09-23 04:33:01

澄清:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

要将 UTF-8 发送到 stdout(无论控制台的编码如何),请使用其接受字节的缓冲区接口:

import sys
sys.stdout.buffer.write(TestText2)

Clarification:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys
sys.stdout.buffer.write(TestText2)
ヅ她的身影、若隐若现 2024-09-23 04:33:01

这是我能从手册中得到的最好的结果,这有点肮脏的黑客:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

似乎文件对象应该有一种方法来改变它们的编码,但据我所知没有。

如果您写入 utf8stdout 然后写入 sys.stdout 而没有先调用 utf8stdout.flush() ,反之亦然,则可能会发生不好的事情。

This is the best I can dope out from the manual, and it's a bit of a dirty hack:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.

倒带 2024-09-23 04:33:01

根据这个答案

您可以从python 3.7开始手动重新配置stdout的编码

import sys
sys.stdout.reconfigure(encoding='utf-8')

As per this answer

You can manually reconfigure the encoding of stdout as of python 3.7

import sys
sys.stdout.reconfigure(encoding='utf-8')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文