将 XML 写入文件会损坏 python 中的文件

发布于 2024-10-08 08:02:40 字数 1537 浏览 3 评论 0原文

我正在尝试将 xml.dom.minidom 对象中的内容写入文件。简单的想法是使用“writexml”方法:

import codecs

def write_xml_native():
    # Building DOM from XML
    xmldoc = minidom.parse('semio2.xml')
    f = codecs.open('codified.xml', mode='w', encoding='utf-8')
    # Using native writexml() method to write
    xmldoc.writexml(f, encoding="utf=8")
    f.close()

问题是它会损坏文件中的非拉丁编码文本。另一种方法是获取文本字符串并将其显式写入文件:

def write_xml():
    # Building DOM from XML
    xmldoc = minidom.parse('semio2.xml')
    # Opening file for writing UTF-8, which is XML's default encoding
    f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
    # Writing XML in UTF-8 encoding, as recommended in the documentation
    f.write(xmldoc.toxml("utf-8"))
    f.close()

这会导致以下错误:

Traceback (most recent call last):
  File "D:\Projects\Semio\semioparser.py", line 45, in <module>
    write_xml()
  File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
    f.write(xmldoc.toxml(encoding="utf-8"))
  File "C:\Python26\lib\codecs.py", line 686, in write
    return self.writer.write(data)
  File "C:\Python26\lib\codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)

如何将 XML 文本写入文件?我缺少什么?

编辑。通过添加解码语句修复错误: f.write(xmldoc.toxml("utf-8").decode("utf-8")) 但俄罗斯的符号仍然被破坏。

在解释器中查看时,文本不会损坏,但写入文件时会损坏。

I'm attempting to write contents from xml.dom.minidom object to file. The simple idea is to use 'writexml' method:

import codecs

def write_xml_native():
    # Building DOM from XML
    xmldoc = minidom.parse('semio2.xml')
    f = codecs.open('codified.xml', mode='w', encoding='utf-8')
    # Using native writexml() method to write
    xmldoc.writexml(f, encoding="utf=8")
    f.close()

The problem is that it corrupts the non-latin-encoded text in the file. The other way is to get the text string and write it to file explicitly:

def write_xml():
    # Building DOM from XML
    xmldoc = minidom.parse('semio2.xml')
    # Opening file for writing UTF-8, which is XML's default encoding
    f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
    # Writing XML in UTF-8 encoding, as recommended in the documentation
    f.write(xmldoc.toxml("utf-8"))
    f.close()

This results in the following error:

Traceback (most recent call last):
  File "D:\Projects\Semio\semioparser.py", line 45, in <module>
    write_xml()
  File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
    f.write(xmldoc.toxml(encoding="utf-8"))
  File "C:\Python26\lib\codecs.py", line 686, in write
    return self.writer.write(data)
  File "C:\Python26\lib\codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)

How do I write an XML text to file? What is it I'm missing?

EDIT. Error is fixed by adding decode statement:
f.write(xmldoc.toxml("utf-8").decode("utf-8"))
But russian symbols are still corrupted.

The text is not corrupted when viewed in an interpreter, but when it's written in file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

后来的我们 2024-10-15 08:02:40

嗯,虽然这应该可行:

xml = minidom.parse("test.xml")
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)

您也可以尝试:

with codecs.open("test.xml", "r", "utf-8") as inp:
    xml = minidom.parseString(inp.read().encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)

更新:如果您从字符串对象构造 xml,则应该在传递给 minidom 解析器之前对其进行编码,如下所示:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import codecs
import xml.dom.minidom as minidom

xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)

Hmm, though this should work:

xml = minidom.parse("test.xml")
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)

you may alternatively try:

with codecs.open("test.xml", "r", "utf-8") as inp:
    xml = minidom.parseString(inp.read().encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)

Update: In case you construct xml out of string object, you should encode it before passing to minidom parser, like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import codecs
import xml.dom.minidom as minidom

xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
    xml.writexml(out)
找个人就嫁了吧 2024-10-15 08:02:40

试试这个:

with open("codified.xml", "w") as f:
    f.write(xmldoc.toxml("utf-8").decode("utf-8"))

这对我有用(不过在 Python 3 下)。

Try this:

with open("codified.xml", "w") as f:
    f.write(xmldoc.toxml("utf-8").decode("utf-8"))

This works for me (under Python 3, though).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文