Python unicode 问题 (2.6)
我目前正在为多语言频道开发 irc 机器人,并且遇到了一些 unicode 问题,事实证明这些问题几乎无法解决。
无论我尝试哪种 unicode 编码配置,下面的代码所在的列表函数都不会执行任何操作(c.notice 是一个向 irc 服务器发送 NOTICE 命令的类函数),或者当它确实执行某些操作时,吐出一些显然没有编码的东西。
该命令应该发送“天子”,但它似乎执意要使用先前配置的相同命令发送“天子”。我在下面指定的类型是“不发送任何内容”的类型。在此之前我没有使用过 unicode,因此我很困惑。我也确信我这样做是完全错误的。
(compileCMD 只是获取一个列表并输出列表中所有元素的单个字符串)
uk = self.compileCMD(self.faq.keys(),0)
ukeys = unicode(uk,"utf-8").encode("utf-8")
c.notice(nick, u"Current list of faq entries: %s" % (uk))
I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve.
No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sends a NOTICE command to the irc server) or when it does do something, spits out something which obviously isn't encoded.
The command should be sending 天子, but instead it seems hellbent on sending 天å with a previous configuration of the same commands. The one I have specified below is of the 'send nothing' variety. I haven't worked with unicode before this, and thus I am quite stuck. I'm also positive that I'm doing this completely wrong as a consequence.
(compileCMD just takes a list and spits out a single string of all the elements within the list)
uk = self.compileCMD(self.faq.keys(),0)
ukeys = unicode(uk,"utf-8").encode("utf-8")
c.notice(nick, u"Current list of faq entries: %s" % (uk))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
几点:
unicode(uk,"utf-8").encode("utf-8")
:解码 UTF-8 然后重新编码为 UTF-8 不会改变任何内容。ukeys = unicode(uk,"utf-8").encode("utf-8"):
包含重新编码数据的ukeys
变量稍后不会使用。A few points:
unicode(uk,"utf-8").encode("utf-8")
: Decoding UTF-8 and then reencoding as UTF-8 doesn't change anything.ukeys = unicode(uk,"utf-8").encode("utf-8"):
Theukeys
variable that contains the reencoded data is not used later on.事实证明,问题出在我用来测试输出的客户端上——它本身没有正确处理 unicode!
Turns out the issue was with the client I was using to test the output - it wasn't handling unicode properly itself!
将此:更改
为:
并重试。确保
uk
已经是 UTF-8 编码的字符串(不是 unicode)。我假设 c.notice 方法采用编码字符串作为参数,因为它必须通过线路发送编码字符串。如果通道是多语言的,则可以肯定它希望将其编码为 UTF-8。另外,删除无用的
ukeys = unicode(uk,"utf-8").encode("utf-8")
行。Change this:
into this:
and try again. Make sure that
uk
is already a UTF-8 encoded string (not unicode).I assume that the c.notice method takes an encoded string as argument, since it's got to send an encoded string over the wire. If the channel is multilingual, it's a safe bet that it expects it to be encoded as UTF-8. Also, drop the useless
ukeys = unicode(uk,"utf-8").encode("utf-8")
line.