Linux/Python：编码用于打印的 unicode 字符串

发布于 2024-10-18 20:54:03 字数 767 浏览 2 评论 0原文

我有一个相当大的 python 2.6 应用程序，其中散布着许多打印语句。我自始至终都使用 unicode 字符串，而且通常效果很好。但是，如果我重定向应用程序的输出（例如“myapp.py >output.txt”），那么我偶尔会收到如下错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

我猜如果有人将其 LOCALE 设置为 ASCII，也会出现同样的问题。现在，我完全理解了这个错误的原因。我的 Unicode 字符串中的某些字符无法以 ASCII 进行编码。很公平。但我希望我的 python 程序尽最大努力尝试打印一些可以理解的内容，也许会跳过可疑字符或用它们的 Unicode id 替换它们。

这个问题一定很常见......处理这个问题的最佳实践是什么？我更喜欢一个允许我继续使用普通旧“打印”的解决方案，但如果需要，我可以修改所有出现的情况。

PS：我现在已经解决了这个问题。解决方案不是给出的答案。我使用了 http://wiki.python.org/moin/PrintFails 给出的方法，如下ChrisJ 在其中一条评论中给出。也就是说，我用一个包装器替换 sys.stdout，该包装器使用正确的参数调用 unicode 编码。效果很好。

原文

I have a fairly large python 2.6 application with lots of print statements sprinkled about. I'm using unicode strings throughout, and it usually works great. However, if I redirect the output of the application (like "myapp.py >output.txt"), then I occasionally get errors such as this:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

I guess the same issue comes up if someone has set their LOCALE to ASCII. Now, I understand perfectly well the reason for this error. There are characters in my Unicode strings that are not possible to encode in ASCII. Fair enough. But I'd like my python program to make a best effort to try to print something understandable, maybe skipping the suspicious characters or replacing them with their Unicode ids.

This problem must be common... What is the best practice for handling this problem? I'd prefer a solution that allows me to keep using plain old "print", but I can modify all occurrences if necessary.

PS: I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙女 2024-10-25 20:54:03

如果您要转储到 ASCII 终端，请使用 unicode.encode 手动编码，并指定应忽略错误。

u = u'\xa0'
u.encode('ascii') # This fails
u.encode('ascii', 'ignore') # This replaces failed encoding attempts with empty string

如果你想存储 unicode 文件，请尝试以下操作：

u = u'\xa0'
print >>open('out', 'w'), u # This fails
print >>open('out', 'w'), u.encode('utf-8') # This is ok

If you're dumping to an ASCII terminal, encode manually using unicode.encode, and specify that errors should be ignored.

u = u'\xa0'
u.encode('ascii') # This fails
u.encode('ascii', 'ignore') # This replaces failed encoding attempts with empty string

If you want to store unicode files, try this:

u = u'\xa0'
print >>open('out', 'w'), u # This fails
print >>open('out', 'w'), u.encode('utf-8') # This is ok

回复收藏 0 原文

岁月如刀 2024-10-25 20:54:03

我现在已经解决了这个问题。解决方案没有给出任何答案。我使用了 http://wiki.python.org/moin/PrintFails 给出的方法，如下ChrisJ 在其中一条评论中给出。也就是说，我用一个包装器替换 sys.stdout，该包装器使用正确的参数调用 unicode 编码。效果很好。

回复收藏 0 原文

三寸金莲 2024-10-25 20:54:03

要么通过执行任意 unicode 的方法包装所有打印语句 -> utf8 转换或作为最后的手段将 site.py 中的 Python 默认编码从 ascii 更改为 utf-8。一般来说，将未过滤的 unicode 字符串打印到 sys.stdout 是一个坏主意，因为 Python 会触发将 unicode 字符串隐式转换为配置的默认编码（ascii）。

回复收藏 0 原文

~没有更多了~