将 XML 写入文件会损坏 python 中的文件
我正在尝试将 xml.dom.minidom 对象中的内容写入文件。简单的想法是使用“writexml”方法:
import codecs
def write_xml_native():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
f = codecs.open('codified.xml', mode='w', encoding='utf-8')
# Using native writexml() method to write
xmldoc.writexml(f, encoding="utf=8")
f.close()
问题是它会损坏文件中的非拉丁编码文本。另一种方法是获取文本字符串并将其显式写入文件:
def write_xml():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
# Opening file for writing UTF-8, which is XML's default encoding
f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
# Writing XML in UTF-8 encoding, as recommended in the documentation
f.write(xmldoc.toxml("utf-8"))
f.close()
这会导致以下错误:
Traceback (most recent call last):
File "D:\Projects\Semio\semioparser.py", line 45, in <module>
write_xml()
File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
f.write(xmldoc.toxml(encoding="utf-8"))
File "C:\Python26\lib\codecs.py", line 686, in write
return self.writer.write(data)
File "C:\Python26\lib\codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)
如何将 XML 文本写入文件?我缺少什么?
编辑。通过添加解码语句修复错误: f.write(xmldoc.toxml("utf-8").decode("utf-8"))
但俄罗斯的符号仍然被破坏。
在解释器中查看时,文本不会损坏,但写入文件时会损坏。
I'm attempting to write contents from xml.dom.minidom
object to file. The simple idea is to use 'writexml' method:
import codecs
def write_xml_native():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
f = codecs.open('codified.xml', mode='w', encoding='utf-8')
# Using native writexml() method to write
xmldoc.writexml(f, encoding="utf=8")
f.close()
The problem is that it corrupts the non-latin-encoded text in the file. The other way is to get the text string and write it to file explicitly:
def write_xml():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
# Opening file for writing UTF-8, which is XML's default encoding
f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
# Writing XML in UTF-8 encoding, as recommended in the documentation
f.write(xmldoc.toxml("utf-8"))
f.close()
This results in the following error:
Traceback (most recent call last):
File "D:\Projects\Semio\semioparser.py", line 45, in <module>
write_xml()
File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
f.write(xmldoc.toxml(encoding="utf-8"))
File "C:\Python26\lib\codecs.py", line 686, in write
return self.writer.write(data)
File "C:\Python26\lib\codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)
How do I write an XML text to file? What is it I'm missing?
EDIT. Error is fixed by adding decode statement:f.write(xmldoc.toxml("utf-8").decode("utf-8"))
But russian symbols are still corrupted.
The text is not corrupted when viewed in an interpreter, but when it's written in file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
嗯,虽然这应该可行:
您也可以尝试:
更新:如果您从字符串对象构造 xml,则应该在传递给 minidom 解析器之前对其进行编码,如下所示:
Hmm, though this should work:
you may alternatively try:
Update: In case you construct xml out of string object, you should encode it before passing to minidom parser, like this:
试试这个:
这对我有用(不过在 Python 3 下)。
Try this:
This works for me (under Python 3, though).