如何在Python 3中设置sys.stdout编码?

发布于 2024-10-06 09:07:08 字数 381 浏览 7 评论 0原文

在 Python 2 中设置默认输出编码是一个众所周知的习惯用法:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

这将 sys.stdout 对象包装在编解码器编写器中,以 UTF-8 编码输出。

然而,这种技术在Python 3中不起作用,因为sys.stdout.write()需要一个str,但编码的结果是bytes ,并且当 codecs 尝试将编码字节写入原始 sys.stdout 时会发生错误。

在 Python 3 中执行此操作的正确方法是什么?

Setting the default output encoding in Python 2 is a well-known idiom:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

What is the correct way to do this in Python 3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

陌伤浅笑 2024-10-13 09:07:08

从Python 3.7开始,您可以使用 更改标准流的编码reconfigure()

sys.stdout.reconfigure(encoding='utf-8')

您还可以通过添加 errors 参数来修改编码错误的处理方式。

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

迷迭香的记忆 2024-10-13 09:07:08

Python 3.1 添加了 io.TextIOBase.detach(),并在文档中添加了 sys.stdout

标准流默认为文本模式。要向其中写入或读取二进制数据,请使用底层二进制缓冲区。例如,要将字节写入 stdout,请使用 sys.stdout.buffer.write(b'abc')。默认情况下,使用 io.TextIOBase.detach() 流可以变成二进制。此函数将 stdinstdout 设置为二进制:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

因此,Python 3.1 及更高版本的相应习惯用法是:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
萌吟 2024-10-13 09:07:08

我在搜索相同错误的解决方案时发现了这个线程,

已经建议的替代解决方案是在Python启动之前设置PYTHONIOENCODING环境变量,供我使用 - 这比 Python 初始化后交换 sys.stdout 麻烦更少:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

优点是不必去编辑 Python 代码。

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

笑饮青盏花 2024-10-13 09:07:08

其他答案似乎建议使用 codecs,但 open 对我有用:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

即使我使用 PYTHONIOENCODING="ascii" 运行它,它也有效。

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

您的好友蓝忘机已上羡 2024-10-13 09:07:08

在 Python 2 中设置默认输出编码是一个众所周知的习惯用法

Eek!这是 Python 2 中众所周知的习惯用法吗?对我来说,这似乎是一个危险的错误。

它肯定会弄乱任何尝试将二进制写入标准输出的脚本(例如,如果您是返回图像的 CGI 脚本,则需要它)。字节和字符是完全不同的动物;将指定接受字节的接口与仅接受字符的接口进行猴子修补并不是一个好主意。

CGI 和 HTTP 通常明确地使用字节。您应该只向 sys.stdout 发送字节。在 Python 3 中,这意味着使用 sys.stdout.buffer.write 直接发送字节。对页面内容进行编码以匹配其 charset 参数应该在应用程序中的更高级别进行处理(如果您返回文本内容,而不是二进制内容)。这也意味着 print 不再适合 CGI。

(更令人困惑的是,wsgiref 的 CGIHandler 直到最近才在 py3k 中被破坏,使得无法以这种方式将 WSGI 部署到 CGI。使用 PEP 3333 和 Python 3.2,这终于可行了。)

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

甜心 2024-10-13 09:07:08

使用 detach() 会导致解释器在退出之前尝试关闭 stdout 时打印警告:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

相反,这对我来说效果很好:(

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

当然,写入 default_out< /code> 而不是标准输出。)

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.)

沉溺在你眼里的海 2024-10-13 09:07:08

sys.stdout 在 Python 3 中处于文本模式。因此,您可以直接向其写入 unicode,而不再需要 Python 2 的习惯用法。

在 Python 2 中这会失败:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

然而,它在 Python 3 中工作得很好:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

现在,如果你的 Python 不知道你的 stdouts 编码实际上是什么,那就是一个不同的问题,很可能是在 Python 的构建中。

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

Where this would fail in Python 2:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文