Python 和 gettext 的 UTF-8 错误

发布于 2024-10-30 08:38:45 字数 1136 浏览 7 评论 0原文

我在编辑器中使用 UTF-8，因此此处显示的所有字符串在文件中均为 UTF-8。

我有一个像这样的 python 脚本：

# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
  description=_('automates the dice rolling in the classic game "risk"'), 
  usage=_("usage: %prog attacking defending"))

然后我使用 xgettext 获取所有内容并得到一个 .pot 文件，该文件可以归结为：

"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""

之后，我使用 msginit 获取一个 de.po ，我将其归结为：像这样填写：

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""

运行脚本，我收到以下错误：

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)

我该如何解决这个问题？

原文

I use UTF-8 in my editor, so all strings displayed here are UTF-8 in file.

I have a python script like this:

# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
  description=_('automates the dice rolling in the classic game "risk"'), 
  usage=_("usage: %prog attacking defending"))

Then I used xgettext to get everything out and got a .pot file which can be boiled down to:

"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""

After that, I used msginit to get a de.po which I filled in like this:

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""

Running the script, I get the following error:

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)

How can I fix that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

反目相谮 2024-11-06 08:38:45

该错误意味着您已经对字节字符串调用了编码，因此它尝试使用系统默认编码（Python 2 上的 ascii）将其解码为 Unicode，然后使用您指定的任何内容重新编码。

一般来说，解决它的方法是在尝试使用字符串之前调用 s.decode('utf-8') （或字符串采用的任何编码）。如果您只使用 unicode 文字，它也可能有效：u'automates...' （这取决于如何从 .po 文件替换字符串，我不知道）。

这种令人困惑的行为在 Python 3 中得到了改进，除非您特别告诉它，否则它不会尝试将字节转换为 unicode。

回复收藏 0 原文

谁的年少不轻狂 2024-11-06 08:38:45

我怀疑问题是由 _("string") 返回字节字符串而不是 Unicode 字符串引起的。

明显的解决方法是：

parser = optparse.OptionParser(
        description=_('automates the dice rolling in the classic game "risk"').decode('utf-8'),
        usage=_("usage: %prog attacking defending").decode('utf-8'))

但这感觉不对。

ugettext 或 ugettext python.org/library/gettext.html#gettext.NullTranslations.install" rel="noreferrer">install(True) 可能会有所帮助。

Python gettext 文档给出了这些示例：

import gettext
t = gettext.translation('spam', '/usr/share/locale')
_ = t.ugettext

或者：

import gettext
gettext.install('myapplication', '/usr/share/locale', unicode=1)

我正在尝试重现您的问题，即使我使用 install(unicode=1)，我也会返回一个字节字符串（str 类型）。

要么我错误地使用了 gettext，要么我在 .po/.mo 文件中缺少字符编码声明。

当我了解更多时，我会更新。

xlt = _('automates the dice rolling in the classic game "risk"')
print type(xlt)
if isinstance(xlt, str):
    print 'gettext returned a str (wrong)'
    print xlt
    print xlt.decode('utf-8').encode('utf-8')
elif isinstance(xlt, unicode):
    print 'gettext returned a unicode (right)'
    print xlt.encode('utf-8')

（另一种可能性是在 .po 文件中使用转义符或 Unicode 代码点，但这听起来并不有趣。）

（或者您可以查看系统的 .po 文件，看看它们如何处理非 ASCII 字符。）

My suspicion is that the problem is caused by _("string") returning a byte string and not a Unicode string.

The obvious workaround is this:

parser = optparse.OptionParser(
        description=_('automates the dice rolling in the classic game "risk"').decode('utf-8'),
        usage=_("usage: %prog attacking defending").decode('utf-8'))

But that feels wrong.

ugettext or install(True) may help.

The Python gettext docs give these examples:

import gettext
t = gettext.translation('spam', '/usr/share/locale')
_ = t.ugettext

or:

import gettext
gettext.install('myapplication', '/usr/share/locale', unicode=1)

I am trying to reproduce your problem, and even if I use install(unicode=1), I get back a byte string (str type).

Either I am using gettext incorrectly, or I am missing a character coding declaration in my .po/.mo file.

I will update when I know more.

xlt = _('automates the dice rolling in the classic game "risk"')
print type(xlt)
if isinstance(xlt, str):
    print 'gettext returned a str (wrong)'
    print xlt
    print xlt.decode('utf-8').encode('utf-8')
elif isinstance(xlt, unicode):
    print 'gettext returned a unicode (right)'
    print xlt.encode('utf-8')

(One other possibility is to use escapes or Unicode code points in the .po file, but that doesn't sound like fun.)

(Or you could look at your system's .po files to see how they handle non-ASCII characters.)

回复收藏 0 原文