UnicodeDecodeError: 'ascii'编解码器无法解码
我正在使用 file.readline() 在 Python 中读取包含罗马尼亚语单词的文件。 由于编码,我遇到了许多字符的问题。
示例:
>>> a = "aberație" #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8
我尝试过使用 utf-8、cp500 等进行编码(),但它不起作用。
我找不到我必须使用的正确字符编码?
提前致谢。
编辑:目的是将文件中的单词存储在字典中,并在打印时获取 aberaşie 而不是 'abera\xc8\x9bie'
I'm reading a file that contains Romanian words in Python with file.readline().
I've got problem with many characters because of encoding.
Example :
>>> a = "aberație" #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8
I've tried encode() with utf-8, cp500 etc, but it doesn't work.
I can't find which is the right Character encoding I have to use ?
thanks in advance.
Edit: The aim is to store the word from file in a dictionnary, and when printing it, to obtain aberație and not 'abera\xc8\x9bie'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你想做什么?
这是一组字节:
它是一组字节,表示字符串“aberaşie”的
utf-8
编码。您解码字节以获得您的unicode字符串:如果您想将unicode字符串存储到文件中,那么您必须将其编码为您选择的特定字节格式:
What are you trying to do?
This is a set of bytes:
It's a set of bytes which represents a
utf-8
encoding of the string "aberație". You decode the bytes to get your unicode string:If you want to store the unicode string to a file, then you have to encode it to a particular byte format of your choosing: