如何在Python 3中设置sys.stdout编码?
在 Python 2 中设置默认输出编码是一个众所周知的习惯用法:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
这将 sys.stdout 对象包装在编解码器编写器中,以 UTF-8 编码输出。
然而,这种技术在Python 3中不起作用,因为sys.stdout.write()
需要一个str
,但编码的结果是bytes
,并且当 codecs
尝试将编码字节写入原始 sys.stdout
时会发生错误。
在 Python 3 中执行此操作的正确方法是什么?
Setting the default output encoding in Python 2 is a well-known idiom:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This wraps the sys.stdout
object in a codec writer that encodes output in UTF-8.
However, this technique does not work in Python 3 because sys.stdout.write()
expects a str
, but the result of encoding is bytes
, and an error occurs when codecs
tries to write the encoded bytes to the original sys.stdout
.
What is the correct way to do this in Python 3?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
从Python 3.7开始,您可以使用
更改标准流的编码reconfigure()
:您还可以通过添加
errors
参数来修改编码错误的处理方式。Since Python 3.7 you can change the encoding of standard streams with
reconfigure()
:You can also modify how encoding errors are handled by adding an
errors
parameter.Python 3.1 添加了
io.TextIOBase.detach()
,并在文档中添加了sys.stdout
:因此,Python 3.1 及更高版本的相应习惯用法是:
Python 3.1 added
io.TextIOBase.detach()
, with a note in the documentation forsys.stdout
:Therefore, the corresponding idiom for Python 3.1 and later is:
我在搜索相同错误的解决方案时发现了这个线程,
已经建议的替代解决方案是在Python启动之前设置
PYTHONIOENCODING
环境变量,供我使用 - 这比 Python 初始化后交换 sys.stdout 麻烦更少:优点是不必去编辑 Python 代码。
I found this thread while searching for solutions to the same error,
An alternative solution to those already suggested is to set the
PYTHONIOENCODING
environment variable before Python starts, for my use - this is less trouble then swappingsys.stdout
after Python is initialized:With the advantage of not having to go and edit the Python code.
其他答案似乎建议使用
codecs
,但open
对我有用:即使我使用
PYTHONIOENCODING="ascii"
运行它,它也有效。Other answers seem to recommend using
codecs
, butopen
works for me:This works even when I run it with
PYTHONIOENCODING="ascii"
.Eek!这是 Python 2 中众所周知的习惯用法吗?对我来说,这似乎是一个危险的错误。
它肯定会弄乱任何尝试将二进制写入标准输出的脚本(例如,如果您是返回图像的 CGI 脚本,则需要它)。字节和字符是完全不同的动物;将指定接受字节的接口与仅接受字符的接口进行猴子修补并不是一个好主意。
CGI 和 HTTP 通常明确地使用字节。您应该只向 sys.stdout 发送字节。在 Python 3 中,这意味着使用 sys.stdout.buffer.write 直接发送字节。对页面内容进行编码以匹配其
charset
参数应该在应用程序中的更高级别进行处理(如果您返回文本内容,而不是二进制内容)。这也意味着print
不再适合 CGI。(更令人困惑的是,wsgiref 的 CGIHandler 直到最近才在 py3k 中被破坏,使得无法以这种方式将 WSGI 部署到 CGI。使用 PEP 3333 和 Python 3.2,这终于可行了。)
Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.
It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.
CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using
sys.stdout.buffer.write
to send bytes directly. Encoding page content to match itscharset
parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also meansprint
is no good for CGI any more.(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)
使用 detach() 会导致解释器在退出之前尝试关闭 stdout 时打印警告:
相反,这对我来说效果很好:(
当然,写入 default_out< /code> 而不是标准输出。)
Using
detach()
causes the interpreter to print a warning when it tries to close stdout just before it exits:Instead, this worked fine for me:
(And, of course, writing to
default_out
instead of stdout.)sys.stdout
在 Python 3 中处于文本模式。因此,您可以直接向其写入 unicode,而不再需要 Python 2 的习惯用法。在 Python 2 中这会失败:
然而,它在 Python 3 中工作得很好:
现在,如果你的 Python 不知道你的 stdouts 编码实际上是什么,那就是一个不同的问题,很可能是在 Python 的构建中。
sys.stdout
is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.Where this would fail in Python 2:
However, it works just dandy in Python 3:
Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.