UnicodeDecodeError: 'ascii'编解码器无法解码

发布于 2024-11-18 02:48:06 字数 384 浏览 3 评论 0原文

我正在使用 file.readline() 在 Python 中读取包含罗马尼亚语单词的文件。由于编码，我遇到了许多字符的问题。

示例：

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

我尝试过使用 utf-8、cp500 等进行编码（），但它不起作用。

我找不到我必须使用的正确字符编码？

提前致谢。

编辑：目的是将文件中的单词存储在字典中，并在打印时获取 aberaşie 而不是 'abera\xc8\x9bie'

原文

I'm reading a file that contains Romanian words in Python with file.readline().
I've got problem with many characters because of encoding.

Example :

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

I've tried encode() with utf-8, cp500 etc, but it doesn't work.

I can't find which is the right Character encoding I have to use ?

thanks in advance.

Edit: The aim is to store the word from file in a dictionnary, and when printing it, to obtain aberație and not 'abera\xc8\x9bie'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幻梦 2024-11-25 02:48:06

你想做什么？

这是一组字节：

BYTES = 'abera\xc8\x9bie'

它是一组字节，表示字符串“aberaşie”的 utf-8 编码。您解码字节以获得您的unicode字符串：

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

如果您想将unicode字符串存储到文件中，那么您必须将其编码为您选择的特定字节格式:

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

What are you trying to do?

This is a set of bytes:

BYTES = 'abera\xc8\x9bie'

It's a set of bytes which represents a utf-8 encoding of the string "aberație". You decode the bytes to get your unicode string:

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

If you want to store the unicode string to a file, then you have to encode it to a particular byte format of your choosing:

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

回复收藏 0 原文

~没有更多了~

关于作者

寂寞笑我太脆弱

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

UnicodeDecodeError: 'ascii'编解码器无法解码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

UnicodeDecodeError: 'ascii'编解码器无法解码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。