Linux/Python:编码用于打印的 unicode 字符串
我有一个相当大的 python 2.6 应用程序,其中散布着许多打印语句。我自始至终都使用 unicode 字符串,而且通常效果很好。但是,如果我重定向应用程序的输出(例如“myapp.py >output.txt”),那么我偶尔会收到如下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)
我猜如果有人将其 LOCALE 设置为 ASCII,也会出现同样的问题。现在,我完全理解了这个错误的原因。我的 Unicode 字符串中的某些字符无法以 ASCII 进行编码。很公平。但我希望我的 python 程序尽最大努力尝试打印一些可以理解的内容,也许会跳过可疑字符或用它们的 Unicode id 替换它们。
这个问题一定很常见......处理这个问题的最佳实践是什么?我更喜欢一个允许我继续使用普通旧“打印”的解决方案,但如果需要,我可以修改所有出现的情况。
PS:我现在已经解决了这个问题。解决方案不是给出的答案。我使用了 http://wiki.python.org/moin/PrintFails 给出的方法,如下ChrisJ 在其中一条评论中给出。也就是说,我用一个包装器替换 sys.stdout,该包装器使用正确的参数调用 unicode 编码。效果很好。
I have a fairly large python 2.6 application with lots of print statements sprinkled about. I'm using unicode strings throughout, and it usually works great. However, if I redirect the output of the application (like "myapp.py >output.txt"), then I occasionally get errors such as this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)
I guess the same issue comes up if someone has set their LOCALE to ASCII. Now, I understand perfectly well the reason for this error. There are characters in my Unicode strings that are not possible to encode in ASCII. Fair enough. But I'd like my python program to make a best effort to try to print something understandable, maybe skipping the suspicious characters or replacing them with their Unicode ids.
This problem must be common... What is the best practice for handling this problem? I'd prefer a solution that allows me to keep using plain old "print", but I can modify all occurrences if necessary.
PS: I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您要转储到 ASCII 终端,请使用
unicode.encode
手动编码,并指定应忽略错误。如果你想存储 unicode 文件,请尝试以下操作:
If you're dumping to an ASCII terminal, encode manually using
unicode.encode
, and specify that errors should be ignored.If you want to store unicode files, try this:
我现在已经解决了这个问题。解决方案没有给出任何答案。我使用了 http://wiki.python.org/moin/PrintFails 给出的方法,如下ChrisJ 在其中一条评论中给出。也就是说,我用一个包装器替换 sys.stdout,该包装器使用正确的参数调用 unicode 编码。效果很好。
I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.
要么通过执行任意 unicode 的方法包装所有打印语句 -> utf8 转换或作为最后的手段将 site.py 中的 Python 默认编码从 ascii 更改为 utf-8。一般来说,将未过滤的 unicode 字符串打印到 sys.stdout 是一个坏主意,因为 Python 会触发将 unicode 字符串隐式转换为配置的默认编码(ascii)。
Either wrap all your print statement through a method perform arbitrary unicode -> utf8 conversion or as last resort change the Python default encoding from ascii to utf-8 inside your site.py. In general it is a bad idea printing unicode strings unfiltered to sys.stdout since Python will trigger an implict conversion of unicode strings to the configured default encoding which is ascii.