在Python中通过sys.stdout写入unicode字符串

发布于 2024-08-05 12:32:19 字数 1145 浏览 2 评论 0原文

假设我们无法使用print(从而享受自动编码检测的好处)。这样我们就只剩下 sys.stdout 了。然而,sys.stdout 太愚蠢了,以至于不进行任何合理的编码

现在,阅读 Python wiki 页面 PrintFails 并尝试以下代码:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);

然而,这也是如此不工作(至少在 Mac 上)。太明白为什么了:

>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'

(UTF-8 是终端可以理解的)。

因此,我们将上面的代码更改为:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);

现在 unicode 字符串已正确发送到 sys.stdout 并因此正确打印在终端上(sys.stdout 已附加到终端)。

这是在 sys.stdout 中写入 unicode 字符串的正确方法还是我应该做其他事情?

编辑:有时 - 例如,当将输出管道传输到 less 时 - sys.stdout.encoding 将为 None代码>.在这种情况下,上面的代码将会失败。

Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout. However, sys.stdout is so dumb as to not do any sensible encoding.

Now one reads the Python wiki page PrintFails and goes to try out the following code:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);

However this too does not work (at least on Mac). Too see why:

>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'

(UTF-8 is what one's terminal understands).

So one changes the above code to:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);

And now unicode strings are properly sent to sys.stdout and hence printed properly on the terminal (sys.stdout is attached the terminal).

Is this the correct way to write unicode strings in sys.stdout or should I be doing something else?

EDIT: at times--say, when piping the output to less--sys.stdout.encoding will be None. in this case, the above code will fail.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

凉城 2024-08-12 12:32:19
export PYTHONIOENCODING=utf-8

会完成这项工作,但无法在 python 本身上设置它...

我们可以做的是验证是否未设置并告诉用户在调用脚本之前设置它:

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)
export PYTHONIOENCODING=utf-8

will do the job, but can't set it on python itself ...

what we can do is verify if isn't setting and tell the user to set it before call script with :

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)
温柔女人霸气范 2024-08-12 12:32:19

最好的办法是检查您是否直接连接到终端。如果是,请使用终端的编码。否则,使用系统首选编码。

if sys.stdout.isatty():
    default_encoding = sys.stdout.encoding
else:
    default_encoding = locale.getpreferredencoding()

始终允许用户指定她想要的任何编码也非常重要。通常我将其设置为命令行选项(例如 -e ENCODING),并使用 optparse 模块对其进行解析。

另一个好处是使用自动编码器覆盖sys.stdout。创建编码器并使用它,但保留 sys.stdout 。您可以导入将编码字节串直接写入 sys.stdout 的第 3 方库。

Best idea is to check if you are directly connected to a terminal. If you are, use the terminal's encoding. Otherwise, use system preferred encoding.

if sys.stdout.isatty():
    default_encoding = sys.stdout.encoding
else:
    default_encoding = locale.getpreferredencoding()

It's also very important to always allow the user specify whichever encoding she wants. Usually I make it a command-line option (like -e ENCODING), and parse it with the optparse module.

Another good thing is to not overwrite sys.stdout with an automatic encoder. Create your encoder and use it, but leave sys.stdout alone. You could import 3rd party libraries that write encoded bytestrings directly to sys.stdout.

巡山小妖精 2024-08-12 12:32:19

有一个可选的环境变量“PYTHONIOENCODING”,可以将其设置为所需的默认编码。这将是一种以与整个 Python 一致的方式获取用户所需编码的方法。它隐藏在Python手册这里中。

There is an optional environment variable "PYTHONIOENCODING" which may be set to a desired default encoding. It would be one way of grabbing the user-desired encoding in a way consistent with all of Python. It is buried in the Python manual here.

橘和柠 2024-08-12 12:32:19

这就是我在应用程序中所做的:

sys.stdout.write(s.encode('utf-8'))

这与从 argv 读取 UTF-8 名称的修复完全相反:

for file in sys.argv[1:]:
    file = file.decode('utf-8')

这非常丑陋(恕我直言),因为它强迫你使用 UTF-8.. 这是 Linux/Mac 上的规范,但不是 Windows 上的…无论如何对我有用:)

This is what I am doing in my application:

sys.stdout.write(s.encode('utf-8'))

This is the exact opposite fix for reading UTF-8 names from argv:

for file in sys.argv[1:]:
    file = file.decode('utf-8')

This is very ugly (IMHO) as it force you to work with UTF-8.. which is the norm on Linux/Mac, but not on windows... Works for me anyway :)

长发绾君心 2024-08-12 12:32:19

我不清楚为什么你不能打印;但假设是这样,是的,这种方法对我来说看起来是正确的。

It's not clear to my why you wouldn't be able to do print; but assuming so, yes, the approach looks right to me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文