Python unicode 问题 (2.6)

发布于 2024-08-27 11:38:03 字数 516 浏览 7 评论 0原文

我目前正在为多语言频道开发 irc 机器人,并且遇到了一些 unicode 问题,事实证明这些问题几乎无法解决。

无论我尝试哪种 unicode 编码配置,下面的代码所在的列表函数都不会执行任何操作(c.notice 是一个向 irc 服务器发送 NOTICE 命令的类函数),或者当它确实执行某些操作时,吐出一些显然没有编码的东西。

该命令应该发送“天子”,但它似乎执意要使用先前配置的相同命令发送“天子”。我在下面指定的类型是“不发送任何内容”的类型。在此之前我没有使用过 unicode,因此我很困惑。我也确信我这样做是完全错误的。

(compileCMD 只是获取一个列表并输出列表中所有元素的单个字符串)

uk = self.compileCMD(self.faq.keys(),0)
ukeys = unicode(uk,"utf-8").encode("utf-8")
c.notice(nick, u"Current list of faq entries: %s" % (uk))

I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve.

No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sends a NOTICE command to the irc server) or when it does do something, spits out something which obviously isn't encoded.

The command should be sending 天子, but instead it seems hellbent on sending å¤©å­ with a previous configuration of the same commands. The one I have specified below is of the 'send nothing' variety. I haven't worked with unicode before this, and thus I am quite stuck. I'm also positive that I'm doing this completely wrong as a consequence.

(compileCMD just takes a list and spits out a single string of all the elements within the list)

uk = self.compileCMD(self.faq.keys(),0)
ukeys = unicode(uk,"utf-8").encode("utf-8")
c.notice(nick, u"Current list of faq entries: %s" % (uk))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

喜爱皱眉﹌ 2024-09-03 11:38:03

几点:

  • “天å”字节是“天子”的UTF-8编码,所以你确定这个发送的是错误的吗?处理数据的程序/...是否使用 UTF-8,或者只是将输入解释为不同的编码(例如 Latin-1)?
  • unicode(uk,"utf-8").encode("utf-8"):解码 UTF-8 然后重新编码为 UTF-8 不会改变任何内容。
  • ukeys = unicode(uk,"utf-8").encode("utf-8"): 包含重新编码数据的 ukeys 变量稍后不会使用。

A few points:

  • The bytes "天å­" are the UTF-8 encoding of "天子", so are you sure it's wrong that this is sent? Does the program/... that should process the data use UTF-8, or does it just interpret the input as a different encoding like Latin-1?
  • unicode(uk,"utf-8").encode("utf-8"): Decoding UTF-8 and then reencoding as UTF-8 doesn't change anything.
  • ukeys = unicode(uk,"utf-8").encode("utf-8"): The ukeys variable that contains the reencoded data is not used later on.
满意归宿 2024-09-03 11:38:03

事实证明,问题出在我用来测试输出的客户端上——它本身没有正确处理 unicode!

Turns out the issue was with the client I was using to test the output - it wasn't handling unicode properly itself!

半枫 2024-09-03 11:38:03

将此:更改

u"Current list of faq entries: %s" % (uk)

为:

"Current list of faq entries: %s" % (uk)

并重试。确保 uk 已经是 UTF-8 编码的字符串(不是 unicode)。

我假设 c.notice 方法采用编码字符串作为参数,因为它必须通过线路发送编码字符串。如果通道是多语言的,则可以肯定它希望将其编码为 UTF-8。另外,删除无用的 ukeys = unicode(uk,"utf-8").encode("utf-8") 行。

Change this:

u"Current list of faq entries: %s" % (uk)

into this:

"Current list of faq entries: %s" % (uk)

and try again. Make sure that uk is already a UTF-8 encoded string (not unicode).

I assume that the c.notice method takes an encoded string as argument, since it's got to send an encoded string over the wire. If the channel is multilingual, it's a safe bet that it expects it to be encoded as UTF-8. Also, drop the useless ukeys = unicode(uk,"utf-8").encode("utf-8") line.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文