将数据写入 xml 文件时出现 UnicodeEncodeError
我的目标是编写一个带有少量标签的 XML 文件,其值采用区域语言。我使用 Python 来执行此操作,并使用 IDLE (Pythong GUI) 进行编程。
当我尝试将这些单词写入 xml 文件时,出现以下错误:
UnicodeEncodeError:“ascii”编解码器 无法对位置中的字符进行编码 0-4:序数不在范围内(128)
目前,我没有使用任何 xml 编写器库;相反,我打开一个文件“test.xml”并将数据写入其中。该行遇到此错误: f.write(数据)
如果我用 print 语句替换上面的 write 语句,那么它会在 Python shell 上正确打印数据。
我正在从非 UTF-8、16 或 32 编码格式的 Excel 文件中读取数据。它是其他格式的。 cp1252 正在正确读取数据。
任何将此数据写入 XML 文件的帮助都将受到高度赞赏。
My aim is to write an XML file with few tags whose values are in the regional language. I'm using Python to do this and using IDLE (Pythong GUI) for programming.
While I try to write the words in an xmls file it gives the following error:
UnicodeEncodeError: 'ascii' codec
can't encode characters in position
0-4: ordinal not in range(128)
For now, I'm not using any xml writer library; instead, I'm opening a file "test.xml" and writing the data into it. This error is encountered by the line:f.write(data)
If I replace the above write statement with print statement then it prints the data properly on the Python shell.
I'm reading the data from an Excel file which is not in the UTF-8, 16, or 32 encoding formats. It's in some other format. cp1252 is reading the data properly.
Any help in getting this data written to an XML file would be highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该
.decode
传入的cp1252
以获取Unicode字符串,并.encode
将它们编码为utf-8
(通过在您编写时,这是 XML 的首选编码,即通过
.decode('cp1252')
对传入字节串获取unicodedata
。可以通过使用标准 Python 库的
codecs
模块来打开输入和输出文件,每个文件都使用正确的编码来代替普通的open
,这是可能的,但是我展示的是底层机制(直接应用它通常会更清晰、更明确,但并非总是如此,而不是通过编解码器间接应用 - 这是风格和品味的问题)。重要的是一般原则:在获得输入字符串后立即将其转换为 unicode,在整个处理过程中使用 unicode,最后尽可能将它们转换回字节字符串在输出它们之前。这给你最简单、最直接的生活!-)
You should
.decode
your incomingcp1252
to get Unicode strings, and.encode
them inutf-8
(by far the preferred encoding for XML) at the time you write, i.e.where
unicodedata
is obtained by.decode('cp1252')
on the incoming bytestrings.It's possible to put lipstick on it by using the
codecs
module of the standard Python library to open the input and output files each with their proper encodings in lieu of plainopen
, but what I show is the underlying mechanism (and it's often, though not invariably, clearer and more explicit to apply it directly, rather than indirectly viacodecs
-- a matter of style and taste).What does matter is the general principle: translate your input strings to unicode as soon as you can right after you obtain them, use unicode throughout your processing, translate them back to byte strings at late as you can just before you output them. This gives you the simplest, most straightforward life!-)