在 Python 中管道 stdout 时设置正确的编码

发布于 2024-07-13 07:59:53 字数 601 浏览 7 评论 0原文

当管道传输 Python 程序的输出时,Python 解释器会对编码感到困惑,并将其设置为 None。 这意味着像这样的程序:

# -*- coding: utf-8 -*-
print u"åäö"

正常运行时会正常工作,但会失败:

UnicodeEncodeError:“ascii”编解码器无法对位置 0 处的字符 u'\xa0' 进行编码:序号不在范围内(128)

在管道序列中使用时,序号不在范围 (128) 内。

管道安装时实现此功能的最佳方法是什么? 我可以告诉它使用 shell/文件系统/正在使用的任何编码吗?

到目前为止我看到的建议是直接修改你的 site.py ,或者使用这个 hack 对默认编码进行硬编码:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

Is there a better way to make pipeline work?

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*-
print u"åäö"

will work fine when run normally, but fail with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

Is there a better way to make piping work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

懒的傷心 2024-07-20 07:59:54

上周遇到了类似问题。 在我的 IDE (PyCharm) 中修复起来很容易。

这是我的修复:

从 PyCharm 菜单栏开始:文件 -> 设置...-> 编辑-> 文件编码,然后将:“IDE 编码”、“项目编码”和“属性文件的默认编码”全部设置为 UTF-8,现在她的工作方式就像一个魅力。

希望这可以帮助!

I had a similar issue last week. It was easy to fix in my IDE (PyCharm).

Here was my fix:

Starting from PyCharm menu bar: File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.

Hope this helps!

风月客 2024-07-20 07:59:54

克雷格·麦奎因答案的一个有争议的净化版本。

import sys, codecs
class EncodedOut:
    def __init__(self, enc):
        self.enc = enc
        self.stdout = sys.stdout
    def __enter__(self):
        if sys.stdout.encoding is None:
            w = codecs.getwriter(self.enc)
            sys.stdout = w(sys.stdout)
    def __exit__(self, exc_ty, exc_val, tb):
        sys.stdout = self.stdout

用法:

with EncodedOut('utf-8'):
    print u'ÅÄÖåäö'

An arguable sanitized version of Craig McQueen's answer.

import sys, codecs
class EncodedOut:
    def __init__(self, enc):
        self.enc = enc
        self.stdout = sys.stdout
    def __enter__(self):
        if sys.stdout.encoding is None:
            w = codecs.getwriter(self.enc)
            sys.stdout = w(sys.stdout)
    def __exit__(self, exc_ty, exc_val, tb):
        sys.stdout = self.stdout

Usage:

with EncodedOut('utf-8'):
    print u'ÅÄÖåäö'
[旋木] 2024-07-20 07:59:54

我只是想在这里提到一些我必须花费很长时间进行实验才能最终意识到发生了什么的事情。 这对这里的每个人来说可能都是显而易见的,以至于他们都懒得提及。 但如果他们这样做的话会对我有帮助,所以根据这个原则......!

注意:我具体使用 Jython v 2.7,所以这可能不适用于 CPython...

注意2:我的 .py 文件的前两行是:

# -*- coding: utf-8 -*-
from __future__ import print_function

“%”(又名“插值运算符”)字符串构造机制也会导致其他问题...如果“环境”的默认编码是 ASCII 并且您尝试执行类似的操作

print( "bonjour, %s" % "fréd" )  # Call this "print A"

您在 Eclipse 中运行将没有困难...在 Windows CLI 中( DOS 窗口)您会发现编码是 代码页 850 (我的 Windows 7 操作系统)或类似的东西,至少可以处理欧洲重音字符,所以它会起作用。

print( u"bonjour, %s" % "fréd" ) # Call this "print B"

也会起作用。

OTOH,如果您从 CLI 定向到一个文件,则标准输出编码将为 None,默认为 ASCII(无论如何在我的操作系统上),它将无法处理上述任何一个打印...(可怕的编码错误)。

因此,您可能会考虑使用重定向标准输出

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

,并尝试在 CLI 管道中运行到文件...非常奇怪,上面的 print A 可以工作...但是上面的 print B 会抛出编码错误! 然而,以下内容可以正常工作:

print( u"bonjour, " + "fréd" ) # Call this "print C"

我得出的结论(暂时)是,如果指定为 使用“u”前缀的 Unicode 字符串被提交给 % 处理机制,它似乎涉及使用默认环境编码,无论您是否已将 stdout 设置为重定向!

人们如何处理这个问题是一个选择问题。 我欢迎 Unicode 专家说出为什么会发生这种情况,我是否在某些方面弄错了,对此的首选解决方案是什么,它是否也适用于 CPython,是否发生在Python 3等中,等等。

I just thought I'd mention something here which I had to spent a long time experimenting with before I finally realised what was going on. This may be so obvious to everyone here that they haven't bothered mentioning it. But it would've helped me if they had, so on that principle...!

NB: I am using Jython specifically, v 2.7, so just possibly this may not apply to CPython...

NB2: the first two lines of my .py file here are:

# -*- coding: utf-8 -*-
from __future__ import print_function

The "%" (AKA "interpolation operator") string construction mechanism causes ADDITIONAL problems too... If the default encoding of the "environment" is ASCII and you try to do something like

print( "bonjour, %s" % "fréd" )  # Call this "print A"

You will have no difficulty running in Eclipse... In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 OS) or something similar, which can handle European accented characters at least, so it'll work.

print( u"bonjour, %s" % "fréd" ) # Call this "print B"

will also work.

If, OTOH, you direct to a file from the CLI, the stdout encoding will be None, which will default to ASCII (on my OS anyway), which will not be able to handle either of the above prints... (dreaded encoding error).

So then you might think of redirecting your stdout by using

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

and try running in the CLI piping to a file... Very oddly, print A above will work... But print B above will throw the encoding error! The following will however work OK:

print( u"bonjour, " + "fréd" ) # Call this "print C"

The conclusion I have come to (provisionally) is that if a string which is specified to be a Unicode string using the "u" prefix is submitted to the %-handling mechanism it appears to involve the use of the default environment encoding, regardless of whether you have set stdout to redirect!

How people deal with this is a matter of choice. I would welcome a Unicode expert to say why this happens, whether I've got it wrong in some way, what the preferred solution to this, whether it also applies to CPython, whether it happens in Python 3, etc., etc.

苄①跕圉湢 2024-07-20 07:59:54

我在遗留应用程序中遇到了这个问题,很难确定打印的内容。 我帮助自己解决了这个问题:

# encoding_utf8.py
import codecs
import builtins


def print_utf8(text, **kwargs):
    print(str(text).encode('utf-8'), **kwargs)


def print_utf8(fn):
    def print_fn(*args, **kwargs):
        return fn(str(*args).encode('utf-8'), **kwargs)
    return print_fn


builtins.print = print_utf8(print)

在我的脚本 test.py 之上:

import encoding_utf8
string = 'Axwell Λ Ingrosso'
print(string)

请注意,这会更改所有对 print 的调用以使用编码,因此您的控制台将打印以下内容:

$ python test.py
b'Axwell \xce\x9b Ingrosso'

I ran into this problem in a legacy application, and it was difficult to identify where what was printed. I helped myself with this hack:

# encoding_utf8.py
import codecs
import builtins


def print_utf8(text, **kwargs):
    print(str(text).encode('utf-8'), **kwargs)


def print_utf8(fn):
    def print_fn(*args, **kwargs):
        return fn(str(*args).encode('utf-8'), **kwargs)
    return print_fn


builtins.print = print_utf8(print)

On top of my script, test.py:

import encoding_utf8
string = 'Axwell Λ Ingrosso'
print(string)

Note that this changes ALL calls to print to use an encoding, so your console will print this:

$ python test.py
b'Axwell \xce\x9b Ingrosso'
小瓶盖 2024-07-20 07:59:54

我可以通过调用来“自动化”它:

def __fix_io_encoding(last_resort_default='UTF-8'):
  import sys
  if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
      import os
      defEnc = None
      if defEnc is None :
        try:
          import locale
          defEnc = locale.getpreferredencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.getfilesystemencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.stdin.encoding
        except: pass
      if defEnc is None :
        defEnc = last_resort_default
      os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
      os.execvpe(sys.argv[0],sys.argv,os.environ)
__fix_io_encoding() ; del __fix_io_encoding

是的,如果此“setenv”失败,则可能会出现无限循环。

I could "automate" it with a call to:

def __fix_io_encoding(last_resort_default='UTF-8'):
  import sys
  if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
      import os
      defEnc = None
      if defEnc is None :
        try:
          import locale
          defEnc = locale.getpreferredencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.getfilesystemencoding()
        except: pass
      if defEnc is None :
        try: defEnc = sys.stdin.encoding
        except: pass
      if defEnc is None :
        defEnc = last_resort_default
      os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
      os.execvpe(sys.argv[0],sys.argv,os.environ)
__fix_io_encoding() ; del __fix_io_encoding

Yes, it's possible to get an infinite loop here if this "setenv" fails.

无风消散 2024-07-20 07:59:54

在 Windows 上,当我从编辑器(如 Sublime Text)运行 Python 代码时经常遇到这个问题,但如果从命令行运行它则不会。

在这种情况下,请检查编辑器的参数。 对于 SublimeText,这个 Python.sublime-build 解决了这个问题:

{
  "cmd": ["python", "-u", "$file"],
  "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
  "selector": "source.python",
  "encoding": "utf8",
  "env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"}
}

On Windows, I had this problem very often when running a Python code from an editor (like Sublime Text), but not if running it from command-line.

In this case, check your editor's parameters. In the case of SublimeText, this Python.sublime-build solved it:

{
  "cmd": ["python", "-u", "$file"],
  "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
  "selector": "source.python",
  "encoding": "utf8",
  "env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"}
}
皇甫轩 2024-07-20 07:59:53

您的代码在脚本中运行时可以正常工作,因为 Python 将输出编码为终端应用程序使用的任何编码。 如果您正在使用管道,则必须自己对其进行编码。

经验法则是:始终在内部使用 Unicode。 解码您收到的内容,并对您发送的内容进行编码。

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

另一个教学示例是一个在 ISO-8859-1 和 UTF-8 之间进行转换的 Python 程序,使两者之间的所有内容都大写。

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

设置系统默认编码是一个坏主意,因为您使用的某些模块和库可能依赖于它是 ASCII 的事实。 不要这样做。

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

淡淡の花香 2024-07-20 07:59:53

首先,关于这个解决方案:

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

每次都使用给定的编码显式打印是不切实际的。 这将是重复且容易出错的。

更好的解决方案是在程序开始时更改 sys.stdout,以使用选定的编码进行编码。 这是我在 Python: How is sys 上找到的一个解决方案.stdout.encoding 选择了吗?,特别是“toka”的评论:

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

First, regarding this solution:

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

It's not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.

A better solution is to change sys.stdout at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by "toka":

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
离去的眼神 2024-07-20 07:59:53

您可能想尝试将环境变量“PYTHONIOENCODING”更改为“utf_8”。 我写了一个页面来讲述我遇到的这个问题

博文的 Tl;dr:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

为您提供

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ☺ ☻

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.

Tl;dr of the blog post:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

gives you

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ☺ ☻
装迷糊 2024-07-20 07:59:53
export PYTHONIOENCODING=utf-8

完成这项工作,但无法在 python 本身上设置它...

我们可以做的是验证是否未设置并告诉用户在调用脚本之前设置它:

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)

更新以回复评论:
当管道传输到 stdout 时,问题就存在。
我在 Fedora 25 Python 2.7.13

python --version
Python 2.7.13

cat b.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import sys

print sys.stdout.encoding

running ./b.py

UTF-8

running ./b.py | 中测试 较少的

None
export PYTHONIOENCODING=utf-8

do the job, but can't set it on python itself ...

what we can do is verify if isn't setting and tell the user to set it before call script with :

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)

Update to reply to the comment:
the problem just exist when piping to stdout .
I tested in Fedora 25 Python 2.7.13

python --version
Python 2.7.13

cat b.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import sys

print sys.stdout.encoding

running ./b.py

UTF-8

running ./b.py | less

None
九八野马 2024-07-20 07:59:53

我很惊讶这个答案还没有发布在这里

从 Python 3.7 开始,您可以使用 更改标准流的编码重新配置()

sys.stdout.reconfigure(encoding='utf-8') 
  

您还可以通过添加 errors 参数来修改编码错误的处理方式。

https://stackoverflow.com/a/52372390/15675011

I'm surprised this answer has not been posted here yet

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

https://stackoverflow.com/a/52372390/15675011

听,心雨的声音 2024-07-20 07:59:53

从Python 3.7开始,我们可以使用Python UTF-8模式,通过使用命令行选项-X utf8:

 python -X utf8 testzh.py

脚本testzh.py包含

print("Content-type: text/html; charset=UTF-8\n") 
print("地球你好!")

要将Windows 10 Internet Service IIS设置为CGI脚本处理程序,

我们将可执行文件设置为:

"C:\Program Files\Python39\python.exe" -X utf8 %s

在此处输入图像描述< /a>

这适用于中文表意文字,正如 Microsoft.Edge 浏览器上所预期的那样,如下截图所示: 否则,会发生错误。

输入图像描述此处

请参阅https://docs.python。 org/3/library/os.html#utf8-mode

Since Python 3.7, we can use Python UTF-8 Mode, by using command line option -X utf8:

 python -X utf8 testzh.py

The script testzh.py contains

print("Content-type: text/html; charset=UTF-8\n") 
print("地球你好!")

To set Windows 10 Internet Service IIS as CGI Script handler,

We set Executable as this:

"C:\Program Files\Python39\python.exe" -X utf8 %s

enter image description here

This works for Chinese Ideograms as expected on Browser Microsoft.Edge like this screenshot: Otherwise, error occurs.

enter image description here

Please see https://docs.python.org/3/library/os.html#utf8-mode

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文