在Python中通过sys.stdout写入unicode字符串
假设我们无法使用print
(从而享受自动编码检测的好处)。这样我们就只剩下 sys.stdout 了。然而,sys.stdout 太愚蠢了,以至于不进行任何合理的编码。
现在,阅读 Python wiki 页面 PrintFails 并尝试以下代码:
$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);
然而,这也是如此不工作(至少在 Mac 上)。太明白为什么了:
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'
(UTF-8 是终端可以理解的)。
因此,我们将上面的代码更改为:
$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);
现在 unicode 字符串已正确发送到 sys.stdout 并因此正确打印在终端上(sys.stdout 已附加到终端)。
这是在 sys.stdout 中写入 unicode 字符串的正确方法还是我应该做其他事情?
编辑:有时 - 例如,当将输出管道传输到 less
时 - sys.stdout.encoding
将为 None
代码>.在这种情况下,上面的代码将会失败。
Assume for a moment that one cannot use print
(and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout
. However, sys.stdout
is so dumb as to not do any sensible encoding.
Now one reads the Python wiki page PrintFails and goes to try out the following code:
$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);
However this too does not work (at least on Mac). Too see why:
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'
(UTF-8 is what one's terminal understands).
So one changes the above code to:
$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);
And now unicode strings are properly sent to sys.stdout
and hence printed properly on the terminal (sys.stdout
is attached the terminal).
Is this the correct way to write unicode strings in sys.stdout
or should I be doing something else?
EDIT: at times--say, when piping the output to less
--sys.stdout.encoding
will be None
. in this case, the above code will fail.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
会完成这项工作,但无法在 python 本身上设置它...
我们可以做的是验证是否未设置并告诉用户在调用脚本之前设置它:
will do the job, but can't set it on python itself ...
what we can do is verify if isn't setting and tell the user to set it before call script with :
最好的办法是检查您是否直接连接到终端。如果是,请使用终端的编码。否则,使用系统首选编码。
始终允许用户指定她想要的任何编码也非常重要。通常我将其设置为命令行选项(例如 -e ENCODING),并使用 optparse 模块对其进行解析。
另一个好处是不使用自动编码器覆盖
sys.stdout
。创建编码器并使用它,但保留sys.stdout
。您可以导入将编码字节串直接写入 sys.stdout 的第 3 方库。Best idea is to check if you are directly connected to a terminal. If you are, use the terminal's encoding. Otherwise, use system preferred encoding.
It's also very important to always allow the user specify whichever encoding she wants. Usually I make it a command-line option (like
-e ENCODING
), and parse it with theoptparse
module.Another good thing is to not overwrite
sys.stdout
with an automatic encoder. Create your encoder and use it, but leavesys.stdout
alone. You could import 3rd party libraries that write encoded bytestrings directly tosys.stdout
.有一个可选的环境变量“PYTHONIOENCODING”,可以将其设置为所需的默认编码。这将是一种以与整个 Python 一致的方式获取用户所需编码的方法。它隐藏在Python手册这里中。
There is an optional environment variable "PYTHONIOENCODING" which may be set to a desired default encoding. It would be one way of grabbing the user-desired encoding in a way consistent with all of Python. It is buried in the Python manual here.
这就是我在应用程序中所做的:
sys.stdout.write(s.encode('utf-8'))
这与从 argv 读取 UTF-8 名称的修复完全相反:
这非常丑陋(恕我直言),因为它强迫你使用 UTF-8.. 这是 Linux/Mac 上的规范,但不是 Windows 上的…无论如何对我有用:)
This is what I am doing in my application:
sys.stdout.write(s.encode('utf-8'))
This is the exact opposite fix for reading UTF-8 names from argv:
This is very ugly (IMHO) as it force you to work with UTF-8.. which is the norm on Linux/Mac, but not on windows... Works for me anyway :)
我不清楚为什么你不能打印;但假设是这样,是的,这种方法对我来说看起来是正确的。
It's not clear to my why you wouldn't be able to do print; but assuming so, yes, the approach looks right to me.