Python 和 gettext 的 UTF-8 错误
我在编辑器中使用 UTF-8,因此此处显示的所有字符串在文件中均为 UTF-8。
我有一个像这样的 python 脚本:
# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
description=_('automates the dice rolling in the classic game "risk"'),
usage=_("usage: %prog attacking defending"))
然后我使用 xgettext 获取所有内容并得到一个 .pot 文件,该文件可以归结为:
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""
之后,我使用 msginit 获取一个 de.po
,我将其归结为:像这样填写:
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""
运行脚本,我收到以下错误:
File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)
我该如何解决这个问题?
I use UTF-8 in my editor, so all strings displayed here are UTF-8 in file.
I have a python script like this:
# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
description=_('automates the dice rolling in the classic game "risk"'),
usage=_("usage: %prog attacking defending"))
Then I used xgettext to get everything out and got a .pot file which can be boiled down to:
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""
After that, I used msginit to get a de.po
which I filled in like this:
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""
Running the script, I get the following error:
File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)
How can I fix that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该错误意味着您已经对字节字符串调用了编码,因此它尝试使用系统默认编码(Python 2 上的 ascii)将其解码为 Unicode,然后使用您指定的任何内容重新编码。
一般来说,解决它的方法是在尝试使用字符串之前调用 s.decode('utf-8') (或字符串采用的任何编码)。如果您只使用 unicode 文字,它也可能有效:
u'automates...'
(这取决于如何从 .po 文件替换字符串,我不知道)。这种令人困惑的行为在 Python 3 中得到了改进,除非您特别告诉它,否则它不会尝试将字节转换为 unicode。
That error means you've called encode on a bytestring, so it tries to decode it to Unicode using the system default encoding (ascii on Python 2), then re-encode it with whatever you've specified.
Generally, the way to resolve it is to call
s.decode('utf-8')
(or whatever encoding the strings are in) before trying to use the strings. It might also work if you just use unicode literals:u'automates...'
(that depends on how strings are substituted from .po files, which I don't know about).This sort of confusing behaviour is improved in Python 3, which won't try to convert bytes to unicode unless you specifically tell it to.
我怀疑问题是由
_("string")
返回字节字符串而不是 Unicode 字符串引起的。明显的解决方法是:
但这感觉不对。
ugettext 或 ugettext python.org/library/gettext.html#gettext.NullTranslations.install" rel="noreferrer">install(True) 可能会有所帮助。
Python gettext 文档 给出了这些示例:
或者:
我正在尝试重现您的问题,即使我使用
install(unicode=1)
,我也会返回一个字节字符串(str
类型)。要么我错误地使用了 gettext,要么我在 .po/.mo 文件中缺少字符编码声明。
当我了解更多时,我会更新。
(另一种可能性是在 .po 文件中使用转义符或 Unicode 代码点,但这听起来并不有趣。)
(或者您可以查看系统的
.po
文件,看看它们如何处理非 ASCII 字符。)My suspicion is that the problem is caused by
_("string")
returning a byte string and not a Unicode string.The obvious workaround is this:
But that feels wrong.
ugettext or install(True) may help.
The Python gettext docs give these examples:
or:
I am trying to reproduce your problem, and even if I use
install(unicode=1)
, I get back a byte string (str
type).Either I am using gettext incorrectly, or I am missing a character coding declaration in my .po/.mo file.
I will update when I know more.
(One other possibility is to use escapes or Unicode code points in the .po file, but that doesn't sound like fun.)
(Or you could look at your system's
.po
files to see how they handle non-ASCII characters.)我对此不熟悉,但它似乎是 2.6 中的一个已知错误,已在 2.7 中修复:
http:// /bugs.python.org/issue2931
如果使用 2.7 不可行,请尝试以下解决方法:
http://mail.python.org/pipermail/python-dev/2006-May/065458.html
I'm not familiar with this, but it appears to be a known bug in 2.6 that's been fixed in 2.7:
http://bugs.python.org/issue2931
If it's not feasible for you to use 2.7, try this workaround:
http://mail.python.org/pipermail/python-dev/2006-May/065458.html