将数据写入 xml 文件时出现 UnicodeEncodeError

发布于 2024-08-31 21:42:15 字数 482 浏览 2 评论 0原文

我的目标是编写一个带有少量标签的 XML 文件，其值采用区域语言。我使用 Python 来执行此操作，并使用 IDLE (Pythong GUI) 进行编程。

当我尝试将这些单词写入 xml 文件时，出现以下错误：

UnicodeEncodeError：“ascii”编解码器无法对位置中的字符进行编码 0-4：序数不在范围内（128）

目前，我没有使用任何 xml 编写器库；相反，我打开一个文件“test.xml”并将数据写入其中。该行遇到此错误： f.write(数据) 如果我用 print 语句替换上面的 write 语句，那么它会在 Python shell 上正确打印数据。

我正在从非 UTF-8、16 或 32 编码格式的 Excel 文件中读取数据。它是其他格式的。 cp1252 正在正确读取数据。

任何将此数据写入 XML 文件的帮助都将受到高度赞赏。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦途 2024-09-07 21:42:15

您应该.decode传入的cp1252以获取Unicode字符串，并.encode将它们编码为utf-8（通过在您编写时，这是 XML 的首选编码，即

f.write(unicodedata.encode('utf-8'))

通过 .decode('cp1252') 对传入字节串获取 unicodedata 。

可以通过使用标准 Python 库的 codecs 模块来打开输入和输出文件，每个文件都使用正确的编码来代替普通的 open，这是可能的，但是我展示的是底层机制（直接应用它通常会更清晰、更明确，但并非总是如此，而不是通过编解码器间接应用 - 这是风格和品味的问题）。

重要的是一般原则：在获得输入字符串后立即将其转换为 unicode，在整个处理过程中使用 unicode，最后尽可能将它们转换回字节字符串在输出它们之前。这给你最简单、最直接的生活！-)

You should .decode your incoming cp1252 to get Unicode strings, and .encode them in utf-8 (by far the preferred encoding for XML) at the time you write, i.e.

f.write(unicodedata.encode('utf-8'))

where unicodedata is obtained by .decode('cp1252') on the incoming bytestrings.

It's possible to put lipstick on it by using the codecs module of the standard Python library to open the input and output files each with their proper encodings in lieu of plain open, but what I show is the underlying mechanism (and it's often, though not invariably, clearer and more explicit to apply it directly, rather than indirectly via codecs -- a matter of style and taste).

What does matter is the general principle: translate your input strings to unicode as soon as you can right after you obtain them, use unicode throughout your processing, translate them back to byte strings at late as you can just before you output them. This gives you the simplest, most straightforward life!-)

回复收藏 0 原文

~没有更多了~