Django/Python 中的解码不反转 unicode 编码

发布于 2024-08-28 18:47:15 字数 464 浏览 6 评论 0原文

好的，我有一个硬编码的字符串，我像这样声明

name = u"Par Catégorie"

我有一个 # -- 编码：utf-8 -- 魔术标头，所以我猜测它已转换为 utf-8

沿着它输出的路径我

xml_output.toprettyxml(indent='....', encoding='utf-8')

我得到了

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

的大部分数据都是法语，并且在 CDATA 节点中正确输出，但是一个编码字符串保留...我不明白为什么调用 ascii 编解码器。

怎么了？

原文

Ok, I have a hardcoded string I declare like this

name = u"Par Catégorie"

I have a # -- coding: utf-8 -- magic header, so I am guessing it's converted to utf-8

Down the road it's outputted to xml through

xml_output.toprettyxml(indent='....', encoding='utf-8')

And I get a

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

Most of my data is in French and is ouputted correctly in CDATA nodes, but that one harcoded string keep ... I don't see why an ascii codec is called.

what's wrong ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空心↖ 2024-09-04 18:47:15

源文件中的 coding 标头告诉 Python 您的源采用的编码方式。Python 使用这种编码来解码 unicode 字符串文字 (u" Par Catégorie") 转换为 unicode 对象。 unicode对象本身没有编码；它是原始的 unicode 数据。（在内部，Python 将使用两种编码之一，具体取决于它的配置方式，但 Python 代码不应该担心这一点。）

您得到的 UnicodeDecodeError 意味着在某个地方，您正在混合 unicode 字符串和字节串（普通字符串）。将它们混合在一起（连接、执行字符串插值等）Python 将尝试通过使用默认编码 ASCII 解码字节串，将字节串转换为 unicode 字符串。如果字节串包含非 ASCII 数据，则此操作将失败并出现您看到的错误。正在完成的操作可能位于库中的某个位置，但这仍然意味着您正在混合不同类型的输入。

不幸的是，只要字节串只包含 ASCII 数据，它就可以正常工作，这意味着即使在库代码中，这种类型的错误也太频繁了。 Python 3.x 通过消除 unicode 字符串（3.x 中的 str）和字节字符串（3.x 中的 bytes 类型）之间的隐式转换来解决这个问题。）