使用 python xml.sax 解析 XML 实体

发布于 2024-11-15 05:15:40 字数 1857 浏览 0 评论 0原文

使用 xml.sax 使用 python 解析 XML,但我的代码无法捕获实体。为什么skippedEntity()或resolveEntity()没有报告如下:

import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler

#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):

    #SAX handler - Entity resolver
    def resolveEntity(self,publicID,systemID):
        print "TestHandler.resolveEntity: %s  %s" % (publicID,systemID)

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity: %s" % (name)

    def unparsedEntityDecl(self,publicID,systemID,ndata):
        print "TestHandler.unparsedEntityDecl: %s  %s" % (publicID,systemID)

    def startElement(self,name,attrs):
        # name = string.lower(name)
        summary = '' + attrs.get('summary','')
        arg = '' + attrs.get('arg','')
        print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)


def run(xml_string):
    try:
        parser = xml.sax.make_parser()
        stream = cStringIO.StringIO(xml_string)

        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setDTDHandler( curHandler )
        parser.setEntityResolver( curHandler )

        parser.parse(stream)
        stream.close()
    except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e;

def main():
    try:
        XML = "<!DOCTYPE page[ <!ENTITY num 'foo'> ]><test summary='step: &num;'>Entity: &not;</test>"
        run(XML)
    except Exception, e:
      print 'FATAL ERROR: %s' % (str(e))

if __name__== '__main__':
    main()

运行时,我看到的只是:

 TestHandler.startElement(), step: foo ()
 *** PARSER error: <unknown>:1:36: undefined entity

为什么我看不到#num;的resolveEntity打印或 &not; 的跳过条目打印?

Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn't skippedEntity() or resolveEntity() report in the following:

import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler

#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):

    #SAX handler - Entity resolver
    def resolveEntity(self,publicID,systemID):
        print "TestHandler.resolveEntity: %s  %s" % (publicID,systemID)

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity: %s" % (name)

    def unparsedEntityDecl(self,publicID,systemID,ndata):
        print "TestHandler.unparsedEntityDecl: %s  %s" % (publicID,systemID)

    def startElement(self,name,attrs):
        # name = string.lower(name)
        summary = '' + attrs.get('summary','')
        arg = '' + attrs.get('arg','')
        print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)


def run(xml_string):
    try:
        parser = xml.sax.make_parser()
        stream = cStringIO.StringIO(xml_string)

        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setDTDHandler( curHandler )
        parser.setEntityResolver( curHandler )

        parser.parse(stream)
        stream.close()
    except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e;

def main():
    try:
        XML = "<!DOCTYPE page[ <!ENTITY num 'foo'> ]><test summary='step: #'>Entity: ¬</test>"
        run(XML)
    except Exception, e:
      print 'FATAL ERROR: %s' % (str(e))

if __name__== '__main__':
    main()

When run, all I see is:

 TestHandler.startElement(), step: foo ()
 *** PARSER error: <unknown>:1:36: undefined entity

Why don't I see the resolveEntity print for # or the skipped entry print for ¬?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

乞讨 2024-11-22 05:15:40

我认为resolveEntity 和skippedEntity 仅为外部DTD 调用。我通过修改 XML 使其工作。

XML = """<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE test SYSTEM "external.dtd" >
<test summary='step: &foo; &bar;'>Entity: ¬</test>
"""

external.dtd 包含两个简单的实体声明。

<!ENTITY foo "bar">
<!ENTITY bar "foo">

另外,我摆脱了resolveEntity。

这个输出 -

TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not

希望这有帮助。

I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.

XML = """<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE test SYSTEM "external.dtd" >
<test summary='step: &foo; &bar;'>Entity: ¬</test>
"""

The external.dtd contains two simple entity declarations.

<!ENTITY foo "bar">
<!ENTITY bar "foo">

Also, I got rid of resolveEntity.

This outputs -

TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not

Hope this helps.

吃→可爱长大的 2024-11-22 05:15:40

这是您的程序的修改版本,我希望它有意义。它演示了调用所有 TestHandler 方法的情况。

import StringIO
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value. 
    def resolveEntity(self, publicID, systemID):
        print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
        return systemID

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity(): %s" % (name)

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print 'TestHandler.startElement():', summary

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = StringIO.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException, e:
        print "*** PARSER error: %s" % e

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: #'>Entity: ¬</test>
"""

main(XML)

test.dtd 包含:

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

输出:

TestHandler.resolveEntity(): None test.dtd
TestHandler.unparsedEntityDecl(): None bar.gif
TestHandler.startElement(): step: FOO
TestHandler.skippedEntity(): not

Addition

据我所知,仅当使用外部 DTD 时才会调用 skippedEntity (至少我无法提出反例;如果 文档 更清晰一些)。

Adam 在他的回答中说,仅针对外部 DTD 调用 resolveEntity。但这并不完全正确。在处理对内部或外部 DTD 子集中声明的外部实体的引用时,也会调用 resolveEntity。例如:

<!DOCTYPE test [
<!ENTITY num SYSTEM "bar.txt">
]>

bar.txt 的内容可以是 FOO。在这种情况下不可能在属性值中引用实体

Here is a modified version of your program that I hope makes sense. It demonstrates a case where all TestHandler methods are called.

import StringIO
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value. 
    def resolveEntity(self, publicID, systemID):
        print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
        return systemID

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity(): %s" % (name)

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print 'TestHandler.startElement():', summary

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = StringIO.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException, e:
        print "*** PARSER error: %s" % e

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: #'>Entity: ¬</test>
"""

main(XML)

test.dtd contains:

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

Output:

TestHandler.resolveEntity(): None test.dtd
TestHandler.unparsedEntityDecl(): None bar.gif
TestHandler.startElement(): step: FOO
TestHandler.skippedEntity(): not

Addition

As far as I can tell, skippedEntity is called only when an external DTD is used (at least I can't come up with a counterexample; it would be nice if the the documentation was a little clearer).

Adam said in his answer that resolveEntity is called only for external DTDs. But that is not quite true. resolveEntity is also called when processing a reference to an external entity that is declared in an internal or external DTD subset. For example:

<!DOCTYPE test [
<!ENTITY num SYSTEM "bar.txt">
]>

where the content of bar.txt could be, say, FOO. In this case it is not possible to refer to the entity in an attribute value.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文