使用 DTD 从 Sax 到 Dom (python)

发布于 2024-08-15 02:46:30 字数 1021 浏览 1 评论 0原文

我需要一个带有 DTD 的经过验证的 DomTree(使用 getElementById)。 验证和解析工作正常,但 dom 无法正常工作:

from xml.dom import minidom 
from xml.dom.pulldom import SAX2DOM
from lxml import etree
import lxml.sax
from StringIO import StringIO

data_string = """\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)*>
<!ELEMENT bar (#PCDATA)>
<!ATTLIST bar id ID #REQUIRED>]><foo><bar id="nr_0">text</bar></foo> 
"""

#parser, with vali. at parsing
etree_parser = etree.XMLParser(dtd_validation=True,attribute_defaults=True) 
#parse it
sax_tree = etree.parse(StringIO(data_string),etree_parser);
handler = SAX2DOM();
lxml.sax.saxify(sax_tree,handler);
domObject = handler.document;

print domObject.getElementById("nr_0");
#returns None

print minidom.parseString(data_string).getElementById("nr_0");
#returns <DOM Element: bar at 0x7f36b77dc0e0>

似乎 SAX2DOM 不会将 DTD 传递给 dom。我是不是忘记了什么? 我读过在 dom 构建后不可能加载 DTD。

有什么想法吗?

I need a validated DomTree with DTD (to use getElementById).
Validating and Parsing works, but the dom does't work properly:

from xml.dom import minidom 
from xml.dom.pulldom import SAX2DOM
from lxml import etree
import lxml.sax
from StringIO import StringIO

data_string = """\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)*>
<!ELEMENT bar (#PCDATA)>
<!ATTLIST bar id ID #REQUIRED>]><foo><bar id="nr_0">text</bar></foo> 
"""

#parser, with vali. at parsing
etree_parser = etree.XMLParser(dtd_validation=True,attribute_defaults=True) 
#parse it
sax_tree = etree.parse(StringIO(data_string),etree_parser);
handler = SAX2DOM();
lxml.sax.saxify(sax_tree,handler);
domObject = handler.document;

print domObject.getElementById("nr_0");
#returns None

print minidom.parseString(data_string).getElementById("nr_0");
#returns <DOM Element: bar at 0x7f36b77dc0e0>

It seems that SAX2DOM wont pass the DTD to the dom. Did I forgott something?
I've read it is impossible to load the DTD after the dom is build.

any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

弥枳 2024-08-22 02:46:30

据我所知:SAX DTD 事件不是由 ContentHandler 处理的,而是由 DTDHandler,这是一个可以在 sax 解析器 (XMLReader) 上设置的属性。这意味着如果不序列化和重新分析文档就无法执行此操作。

validated_string = etree.tostring(tree)
domDocument = minidom.parseString(validated_string)

另一方面:除非您确实需要 minidom 文档,否则最好只使用 lxml 树。 (您可以使用 xpath 相当于 getElementById,或者查看 etree.XMLDTDIDetree.parseid

As far as I know: SAX DTD events are not handled by the ContentHandler, but by the DTDHandler, which is a property you can set on the sax parser (XMLReader). This means that you cannot do this without serializing and reparsing the document.

validated_string = etree.tostring(tree)
domDocument = minidom.parseString(validated_string)

On the other hand: unless you really need a minidom document, you'd be better off just staying with the lxml tree. (you can use xpath for the equivalent of getElementById, or have a look at etree.XMLDTDID and etree.parseid)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文