使用 DTD 从 Sax 到 Dom (python)
我需要一个带有 DTD 的经过验证的 DomTree(使用 getElementById
)。 验证和解析工作正常,但 dom 无法正常工作:
from xml.dom import minidom
from xml.dom.pulldom import SAX2DOM
from lxml import etree
import lxml.sax
from StringIO import StringIO
data_string = """\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)*>
<!ELEMENT bar (#PCDATA)>
<!ATTLIST bar id ID #REQUIRED>]><foo><bar id="nr_0">text</bar></foo>
"""
#parser, with vali. at parsing
etree_parser = etree.XMLParser(dtd_validation=True,attribute_defaults=True)
#parse it
sax_tree = etree.parse(StringIO(data_string),etree_parser);
handler = SAX2DOM();
lxml.sax.saxify(sax_tree,handler);
domObject = handler.document;
print domObject.getElementById("nr_0");
#returns None
print minidom.parseString(data_string).getElementById("nr_0");
#returns <DOM Element: bar at 0x7f36b77dc0e0>
似乎 SAX2DOM 不会将 DTD 传递给 dom。我是不是忘记了什么? 我读过在 dom 构建后不可能加载 DTD。
有什么想法吗?
I need a validated DomTree with DTD (to use getElementById
).
Validating and Parsing works, but the dom does't work properly:
from xml.dom import minidom
from xml.dom.pulldom import SAX2DOM
from lxml import etree
import lxml.sax
from StringIO import StringIO
data_string = """\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)*>
<!ELEMENT bar (#PCDATA)>
<!ATTLIST bar id ID #REQUIRED>]><foo><bar id="nr_0">text</bar></foo>
"""
#parser, with vali. at parsing
etree_parser = etree.XMLParser(dtd_validation=True,attribute_defaults=True)
#parse it
sax_tree = etree.parse(StringIO(data_string),etree_parser);
handler = SAX2DOM();
lxml.sax.saxify(sax_tree,handler);
domObject = handler.document;
print domObject.getElementById("nr_0");
#returns None
print minidom.parseString(data_string).getElementById("nr_0");
#returns <DOM Element: bar at 0x7f36b77dc0e0>
It seems that SAX2DOM wont pass the DTD to the dom. Did I forgott something?
I've read it is impossible to load the DTD after the dom is build.
any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
据我所知:SAX DTD 事件不是由 ContentHandler 处理的,而是由 DTDHandler,这是一个可以在 sax 解析器 (XMLReader) 上设置的属性。这意味着如果不序列化和重新分析文档就无法执行此操作。
另一方面:除非您确实需要 minidom 文档,否则最好只使用 lxml 树。 (您可以使用 xpath 相当于
getElementById
,或者查看etree.XMLDTDID
和etree.parseid
)As far as I know: SAX DTD events are not handled by the ContentHandler, but by the DTDHandler, which is a property you can set on the sax parser (XMLReader). This means that you cannot do this without serializing and reparsing the document.
On the other hand: unless you really need a minidom document, you'd be better off just staying with the lxml tree. (you can use xpath for the equivalent of
getElementById
, or have a look atetree.XMLDTDID
andetree.parseid
)