Python utf-8重音问题

发布于 2024-11-28 09:28:03 字数 1117 浏览 0 评论 0原文

我在口音方面遇到一些问题。

我做了一个 python 脚本，它从某些输入（IMAP 获取）中获取单词“refeição”，这个单词是葡萄牙语的，我需要将其转换为人类可读的。解码后，它应该显示为“refeição”，但我没有得到这个结果...

>>> print a 
refeiÃ§Ã£o
>>> ENCODING = locale.getpreferredencoding()
>>> print ENCODING
UTF-8
>>> print a.encode(ENCODING)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)
>>> a.decode('utf-8')
u'refei\xe7\xe3o'
>>> print a.decode('utf-8')
refeiÃ§Ã£o

更新：

root@ticuna:/etc/scripts# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

此外，这些单词被插入到 mysql 数据库中，并且“不可读”字符显示在相同的数据库中终端中的方式。表排序规则为 utf8_general_ci

原文

I am having some problems with accents.

I did a python script that are getting the word "refeiÃ§Ã£o" from some input (IMAP fetch), this word is in Portuguese and I need convert it to be human readable. After decode, it should appear like "refeição" but I am not getting this result...

>>> print a 
refeiÃ§Ã£o
>>> ENCODING = locale.getpreferredencoding()
>>> print ENCODING
UTF-8
>>> print a.encode(ENCODING)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)
>>> a.decode('utf-8')
u'refei\xe7\xe3o'
>>> print a.decode('utf-8')
refeiÃ§Ã£o

Updated:

root@ticuna:/etc/scripts# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Also, theses words are inserted in a mysql database and the "unreadable" characters are showing in the same way that is in terminal.
The table collation is utf8_general_ci

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冰葑 2024-12-05 09:28:03

看起来您的终端窗口以单字节 ISO-8859-1 字符集（“latin-1”）显示文本，但您的 Python 解释器认为终端正在使用 UTF-8。从u'refei\xe7\xe3o'中我们可以看到Python具有正确的葡萄牙语字母内部表示。显然， print 命令随后将内部表示形式转换为 UTF-8 并将其发送到您的终端，当终端将该 UTF-8 解释为 ISO-8859-1 时，会产生乱码。

解决方法是使您的区域设置与终端正在执行的操作相匹配 - 通过更改区域设置或确保您的终端为 utf-8。

回复收藏 0 原文

太阳男子 2024-12-05 09:28:03

作为解决办法，我正在删除所有重音。

这是我使用的代码：

def remove_accents(s):
   return ''.join((c for c in unicodedata.normalize('NFD', s.decode('utf-8')) if unicodedata.category(c) != 'Mn'))

基于这个答案：
最好的方法是什么删除 Python unicode 字符串中的重音符号？

As work around, I am removing all accents.

Here is the code that I used:

def remove_accents(s):
   return ''.join((c for c in unicodedata.normalize('NFD', s.decode('utf-8')) if unicodedata.category(c) != 'Mn'))

Based in this answer:
What is the best way to remove accents in a Python unicode string?

回复收藏 0 原文

~没有更多了~

关于作者

烟花易冷人易散

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Python utf-8重音问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

Python utf-8重音问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。