Python xml.dom.minidom 生成无效的 XML?

发布于 2024-11-02 17:47:50 字数 1705 浏览 0 评论 0原文

我在 xml.dom.minidom python 包中遇到了奇怪的问题。我生成一个文档,并用从终端获取的数据填充它。有时,此类数据包含终端控制字符。当我使用 minidom.toprettyxml() 将此类字符存储在文本数据节点中时,一切似乎都很好,但是生成的文档不是有效的 XML。

有谁知道为什么 minidom 允许生成无效文档?这与“迷你”部分有关吗?

这是提取的示例代码(还有一些系统信息):

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.dom import minidom
>>> impl = minidom.getDOMImplementation()
>>> doc = impl.createDocument(None, "results", None)
>>> root = doc.firstChild
>>> outString = "test "+chr(1) #here goes control character
>>> root.appendChild(doc.createTextNode(outString))
<DOM Text node "'test \x01'">
>>> doc.toprettyxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>\n<results>\n\ttest \x01\n</results>\n'
>>> with open("/tmp/outfile", "w") as f:
...     f.write(doc.toprettyxml(encoding="utf-8"))
... 
>>> doc2 = minidom.parse("/tmp/outfile")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
    return expatbuilder.parse(file)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 6
>>> open("/tmp/outfile","r").readlines()
['<?xml version="1.0" encoding="utf-8"?>\n', '<results>\n', '\ttest \x01\n', '</results>\n']
>>> 

I have encountered strange problem with xml.dom.minidom python package. I generate a document, populating it with data taken from terminal. Sometimes such data contain terminal control characters. When I stored such character in text data node with minidom.toprettyxml() everything seems to be fine, however, the generated document is not a valid XML.

Does anyone know why minidom allows to generate invalid document? Is this connected with "mini" part?

Here is the extracted example code (with some system info too):

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.dom import minidom
>>> impl = minidom.getDOMImplementation()
>>> doc = impl.createDocument(None, "results", None)
>>> root = doc.firstChild
>>> outString = "test "+chr(1) #here goes control character
>>> root.appendChild(doc.createTextNode(outString))
<DOM Text node "'test \x01'">
>>> doc.toprettyxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>\n<results>\n\ttest \x01\n</results>\n'
>>> with open("/tmp/outfile", "w") as f:
...     f.write(doc.toprettyxml(encoding="utf-8"))
... 
>>> doc2 = minidom.parse("/tmp/outfile")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
    return expatbuilder.parse(file)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 6
>>> open("/tmp/outfile","r").readlines()
['<?xml version="1.0" encoding="utf-8"?>\n', '<results>\n', '\ttest \x01\n', '</results>\n']
>>> 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

救星 2024-11-09 17:47:51

查看 _write_data 的代码,它仅转义 & 符号、斜杠和括号:

def _write_data(writer, data):
    "Writes datachars to writer."
    data = data.replace("&", "&").replace("<", "<")
    data = data.replace("\"", """).replace(">", ">")
    writer.write(data)

正如您猜测的那样,minidom 并不是一个完全健壮的实现(例如,它缺乏命名空间的实现)。

Looking at the code for _write_data it only escapes ampersands, slashes and brackets:

def _write_data(writer, data):
    "Writes datachars to writer."
    data = data.replace("&", "&").replace("<", "<")
    data = data.replace("\"", """).replace(">", ">")
    writer.write(data)

As you surmised, minidom isn't intended as a fully robust implementation (its implementation of namespaces is lacking, for instance).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文