从 JAXP SAX ContentHandler 发出 XML 的最节省内存的方法是什么?

发布于 2024-08-16 17:49:46 字数 1271 浏览 6 评论 0原文

我的情况类似于 关于发出 XML 的早期问题。我正在分析 SAX ContentHandler 中的数据,同时将其序列化为流。我怀疑链接问题中的解决方案(尽管这正是我在 API 方面寻找的解决方案)内存效率不高,因为它涉及使用 XSLT 处理器进行身份转换。我希望程序的内存消耗受到限制,而不是随着输入大小的增加而增加。

如何轻松地将 ContentHandler 方法的参数转发到序列化器,而无需进行杂技操作以将 StAX 适配为 SAX,或者更糟糕的是,将 SAX 事件内容复制到输出流?

编辑:这是我所追求的一个最小的例子。 thingIWant 应该只写入给定的 OutputStream。正如我所说,前面的问题有一个 TransformerHandler,它为我提供了正确的 API,但它使用 XSLT 处理器而不仅仅是简单的序列化。

public class MyHandler implements ContentHandler {

    ContentHandler thingIWant;

    MyHandler(OutputStream outputStream) {
        thingIWant = setup(outputStream);
    }

    public void startDocument() throws SAXException {
        // parsing logic
        thingIWant.startDocument();
    }

    public void startElement(String uri, String localName, String qName,
                             Attributes atts) throws SAXException {
        // parsing logic
        thingIWant.startElement(uri, localName, qName, atts);
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        // parsing logic
        thingIWant.characters(ch, start, length);
    }

    // etc...
 }

I have a situation similar to an earlier question about emitting XML. I am analyzing data in a SAX ContentHandler while serializing it to a stream. I am suspicious that the solution in the linked question -- though it is exactly what I am looking for in terms of the API -- is not memory-efficient, since it involves an identity transform with the XSLT processor. I want the memory consumption of the program to be bounded, rather than it growing with the input size.

How can I easily forward the parameters to my ContentHandler methods to a serializer without doing acrobatics to adapt e.g. StAX to SAX, or worse yet, copying the SAX event contents to the output stream?

Edit: here's a minimal example of what I am after. thingIWant should just write to the OutputStream given to it. Like I said, the earlier question has a TransformerHandler that gives me the right API, but it uses the XSLT processor instead of just a simple serialization.

public class MyHandler implements ContentHandler {

    ContentHandler thingIWant;

    MyHandler(OutputStream outputStream) {
        thingIWant = setup(outputStream);
    }

    public void startDocument() throws SAXException {
        // parsing logic
        thingIWant.startDocument();
    }

    public void startElement(String uri, String localName, String qName,
                             Attributes atts) throws SAXException {
        // parsing logic
        thingIWant.startElement(uri, localName, qName, atts);
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        // parsing logic
        thingIWant.characters(ch, start, length);
    }

    // etc...
 }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

独留℉清风醉 2024-08-23 17:49:46

我最近遇到了类似的问题。这是我为获得 thingIWant 而编写的类:

import java.io.OutputStream;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.*;

public class XMLSerializer implements ContentHandler {
    static final private TransformerFactory tf = TransformerFactory.newInstance();
    private ContentHandler ch;

    public XMLSerializer(OutputStream os) throws SAXException {
        try {
            final Transformer t = tf.newTransformer();

            t.transform(new SAXSource(                
                new XMLReader() {     
                    public ContentHandler getContentHandler() { return ch; }
                    public DTDHandler getDTDHandler() { return null; }      
                    public EntityResolver getEntityResolver() { return null; }
                    public ErrorHandler getErrorHandler() { return null; }    
                    public boolean getFeature(String name) { return false; }
                    public Object getProperty(String name) { return null; } 
                    public void parse(InputSource input) { }               
                    public void parse(String systemId) { }  
                    public void setContentHandler(ContentHandler handler) { ch = handler; }                
                    public void setDTDHandler(DTDHandler handler) { }
                    public void setEntityResolver(EntityResolver resolver) { }
                    public void setErrorHandler(ErrorHandler handler) { }
                    public void setFeature(String name, boolean value) { }
                    public void setProperty(String name, Object value) { }
                }, new InputSource()),                                    
                new StreamResult(os));
        }
        catch (TransformerException e) {
            throw new SAXException(e);  
        }

        if (ch == null)
            throw new SAXException("Transformer didn't set ContentHandler");
    }

    public void setDocumentLocator(Locator locator) {
        ch.setDocumentLocator(locator);
    }

    public void startDocument() throws SAXException {
        ch.startDocument();
    }

    public void endDocument() throws SAXException {
        ch.endDocument();
    }

    public void startPrefixMapping(String prefix, String uri) throws SAXException {
        ch.startPrefixMapping(prefix, uri);
    }

    public void endPrefixMapping(String prefix) throws SAXException {
        ch.endPrefixMapping(prefix);
    }

    public void startElement(String uri, String localName, String qName, Attributes atts)
        throws SAXException {
        ch.startElement(uri, localName, qName, atts);
    }

    public void endElement(String uri, String localName, String qName)
        throws SAXException {
        ch.endElement(uri, localName, qName);
    }

    public void characters(char[] ch, int start, int length)
        throws SAXException {
        this.ch.characters(ch, start, length);
    }

    public void ignorableWhitespace(char[] ch, int start, int length)
        throws SAXException {
        this.ch.ignorableWhitespace(ch, start, length);
    }

    public void processingInstruction(String target, String data)
        throws SAXException {
        ch.processingInstruction(target, data);
    }

    public void skippedEntity(String name) throws SAXException {
        ch.skippedEntity(name);
    }
}

基本上,它拦截 Transformer 对 parse() 的调用,并获取对其内部 ContentHandler 的引用。之后,该类充当被捕获的 ContentHandler 的代理。

不是很干净,但是可以用。

I recently had a similar problem. Here is the class I wrote to get you thingIWant:

import java.io.OutputStream;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.*;

public class XMLSerializer implements ContentHandler {
    static final private TransformerFactory tf = TransformerFactory.newInstance();
    private ContentHandler ch;

    public XMLSerializer(OutputStream os) throws SAXException {
        try {
            final Transformer t = tf.newTransformer();

            t.transform(new SAXSource(                
                new XMLReader() {     
                    public ContentHandler getContentHandler() { return ch; }
                    public DTDHandler getDTDHandler() { return null; }      
                    public EntityResolver getEntityResolver() { return null; }
                    public ErrorHandler getErrorHandler() { return null; }    
                    public boolean getFeature(String name) { return false; }
                    public Object getProperty(String name) { return null; } 
                    public void parse(InputSource input) { }               
                    public void parse(String systemId) { }  
                    public void setContentHandler(ContentHandler handler) { ch = handler; }                
                    public void setDTDHandler(DTDHandler handler) { }
                    public void setEntityResolver(EntityResolver resolver) { }
                    public void setErrorHandler(ErrorHandler handler) { }
                    public void setFeature(String name, boolean value) { }
                    public void setProperty(String name, Object value) { }
                }, new InputSource()),                                    
                new StreamResult(os));
        }
        catch (TransformerException e) {
            throw new SAXException(e);  
        }

        if (ch == null)
            throw new SAXException("Transformer didn't set ContentHandler");
    }

    public void setDocumentLocator(Locator locator) {
        ch.setDocumentLocator(locator);
    }

    public void startDocument() throws SAXException {
        ch.startDocument();
    }

    public void endDocument() throws SAXException {
        ch.endDocument();
    }

    public void startPrefixMapping(String prefix, String uri) throws SAXException {
        ch.startPrefixMapping(prefix, uri);
    }

    public void endPrefixMapping(String prefix) throws SAXException {
        ch.endPrefixMapping(prefix);
    }

    public void startElement(String uri, String localName, String qName, Attributes atts)
        throws SAXException {
        ch.startElement(uri, localName, qName, atts);
    }

    public void endElement(String uri, String localName, String qName)
        throws SAXException {
        ch.endElement(uri, localName, qName);
    }

    public void characters(char[] ch, int start, int length)
        throws SAXException {
        this.ch.characters(ch, start, length);
    }

    public void ignorableWhitespace(char[] ch, int start, int length)
        throws SAXException {
        this.ch.ignorableWhitespace(ch, start, length);
    }

    public void processingInstruction(String target, String data)
        throws SAXException {
        ch.processingInstruction(target, data);
    }

    public void skippedEntity(String name) throws SAXException {
        ch.skippedEntity(name);
    }
}

Basically, it intercepts the Transformer's call to parse(), and grabs a reference to its internal ContentHandler. After that, the class acts as a proxy to the snagged ContentHandler.

Not very clean, but it works.

迷爱 2024-08-23 17:49:46

第一:不用担心身份转换;它不会构建数据的内存表示。

要实现“tee”功能,您必须创建一个内容处理程序,用于侦听解析器生成的事件流,并将它们传递给转换器为您提供的处理程序。不幸的是,这并不像听起来那么容易:解析器想要将事件发送到 DefaultHandler,而转换器想要从 XMLReader。前者是抽象类,后者是接口。 JDK 还提供了类 XMLFilterImpl< /a>,它实现了 DefaultHandler 的所有接口,但不扩展它......这就是将两个不同的项目合并为“参考实现”所得到的结果。

因此,您需要在两者之间编写一个桥接类:

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLFilterImpl;

/**
 *  Uses a decorator ContentHandler to insert a "tee" into a SAX parse/serialize
 *  stream.
 */
public class SaxTeeExample
{
    public static void main(String[] argv)
    throws Exception
    {
        StringReader src = new StringReader("<root><child>text</child></root>");
        StringWriter dst = new StringWriter();

        Transformer xform = TransformerFactory.newInstance().newTransformer();
        XMLReader reader = new MyReader(SAXParserFactory.newInstance().newSAXParser());
        xform.transform(new SAXSource(reader, new InputSource(src)),
                        new StreamResult(dst));

        System.out.println(dst.toString());
    }


    private static class MyReader
    extends XMLFilterImpl
    {
        private SAXParser _parser;

        public MyReader(SAXParser parser)
        {
            _parser = parser;
        }

        @Override
        public void parse(InputSource input) 
        throws SAXException, IOException
        {
            _parser.parse(input, new XMLFilterBridge(this));
        }

        // this is an example of a "tee" function
        @Override
        public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
        {
            System.out.println("startElement: " + name);
            super.startElement(uri, localName, name, atts);
        }
    }


    private static class XMLFilterBridge
    extends DefaultHandler
    {
        private XMLFilterImpl _filter;

        public XMLFilterBridge(XMLFilterImpl myFilter)
        {
            _filter = myFilter;
        }

        @Override
        public void characters(char[] ch, int start, int length)
        throws SAXException
        {
            _filter.characters(ch, start, length);
        }

        // override all other methods of DefaultHandler
        // ...
    }
}

main 方法设置变压器。有趣的部分是 SAXSource 是围绕 MyReader 构建的。当转换器准备好处理事件时,它将调用该对象的 parse() 方法,并向其传递指定的 InputSource

接下来的部分并不明显:XMLFilterImpl 遵循装饰器模式。转换器将在开始转换之前调用该对象的各种 setter 方法,并传递其自己的处理程序。我不重写的任何方法(例如,startDocument())都会简单地调用委托。作为覆盖示例,我在 startElement() 中进行“分析”(只是 println)。您可能会重写其他 ContentHandler 方法。

最后,XMLFilterBridgeDefaultHandlerXmlReader 之间的桥梁;它也是一个装饰器,每个方法都简单地调用委托。我展示了一项覆盖,但您必须完成所有这些。

First: don't worry about the identity transform; it does not build an in-memory representation of the data.

To implement your "tee" functionality, you have to create a content handler that listens to the stream of events produced by the parser, and passes them on to the handler provided for you by the transformer. Unfortunately, this is not as easy as it sounds: the parser wants to send events to a DefaultHandler, while the transformer wants to read events from an XMLReader. The former is an abstract class, the latter is an interface. The JDK also provides the class XMLFilterImpl, which implements all of the interfaces of DefaultHandler, but does not extend from it ... that's what you get for incorporating two different projects as your "reference implementations."

So, you need to write a bridge class between the two:

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLFilterImpl;

/**
 *  Uses a decorator ContentHandler to insert a "tee" into a SAX parse/serialize
 *  stream.
 */
public class SaxTeeExample
{
    public static void main(String[] argv)
    throws Exception
    {
        StringReader src = new StringReader("<root><child>text</child></root>");
        StringWriter dst = new StringWriter();

        Transformer xform = TransformerFactory.newInstance().newTransformer();
        XMLReader reader = new MyReader(SAXParserFactory.newInstance().newSAXParser());
        xform.transform(new SAXSource(reader, new InputSource(src)),
                        new StreamResult(dst));

        System.out.println(dst.toString());
    }


    private static class MyReader
    extends XMLFilterImpl
    {
        private SAXParser _parser;

        public MyReader(SAXParser parser)
        {
            _parser = parser;
        }

        @Override
        public void parse(InputSource input) 
        throws SAXException, IOException
        {
            _parser.parse(input, new XMLFilterBridge(this));
        }

        // this is an example of a "tee" function
        @Override
        public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
        {
            System.out.println("startElement: " + name);
            super.startElement(uri, localName, name, atts);
        }
    }


    private static class XMLFilterBridge
    extends DefaultHandler
    {
        private XMLFilterImpl _filter;

        public XMLFilterBridge(XMLFilterImpl myFilter)
        {
            _filter = myFilter;
        }

        @Override
        public void characters(char[] ch, int start, int length)
        throws SAXException
        {
            _filter.characters(ch, start, length);
        }

        // override all other methods of DefaultHandler
        // ...
    }
}

The main method sets up the transformer. The interesting part is that the SAXSource is constructed around MyReader. When the transformer is ready for events, it will call the parse() method ofthat object, passing it the specified InputSource.

The next part is not obvious: XMLFilterImpl follows the Decorator pattern. The transformer will call various setter methods on this object before starting the transform, passing its own handlers. Any methods that I don't override (eg, startDocument()) will simply call the delegate. As an example override, I'm doing "analysis" (just a println) in startElement(). You'll probably override other ContentHandler methods.

And finally, XMLFilterBridge is the bridge between DefaultHandler and XmlReader; it's also a decorator, and every method simply calls the delegate. I show one override, but you'll have to do them all.

惜醉颜 2024-08-23 17:49:46

编辑:包括默认的 JDK 版本

最有效的是实现 ContentHandlerXMLWriter。简而言之,您正在从 IO 缓冲区读取和写入。 DOM4J 中有一个正在使用的 XMLWriter以下。您可以继承 XMLWriter 或使用 XMLFilter 进行分析。我在此示例中使用 XMLFilter。请注意,XMLFilter 也是一个ContentHandler。这是完整的代码。

import org.dom4j.io.XMLWriter;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.PrintStream;

public class XMLPipeline {

    public static void main(String[] args) throws Exception {
        String inputFile = "build.xml";
        PrintStream outputStream = System.out;
        new XMLPipeline().pipe(inputFile, outputStream);
    }

//dom4j
public void pipe(String inputFile, OutputStream outputStream) throws
        SAXException, ParserConfigurationException, IOException {
    XMLWriter xwriter = new XMLWriter(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
    analyzer.setContentHandler(xwriter);
    analyzer.parse(inputFile);

    //do what you want with analyzer
    System.err.println(analyzer.elementCount);
}


//default JDK
public void pipeTrax(String inputFile, OutputStream outputStream) throws
        SAXException, ParserConfigurationException, 
        IOException, TransformerException {
    StreamResult xwriter = new StreamResult(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
    TransformerFactory stf = SAXTransformerFactory.newInstance();
    SAXSource ss = new SAXSource(analyzer, new InputSource(inputFile));
    stf.newTransformer().transform(ss, xwriter);
    System.out.println(analyzer.elementCount);
}

//This method simply reads from a file, runs it through SAX parser and dumps it 
//to dom4j writer
public void dom4jNoop(String inputFile, OutputStream outputStream) throws
        IOException, SAXException {
    XMLWriter xwriter = new XMLWriter(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    xreader.setContentHandler(xwriter);
    xreader.parse(inputFile);

}

//Simplest way to read a file and write it back to an output stream
public void traxNoop(String inputFile, OutputStream outputStream) 
  throws TransformerException {
    TransformerFactory stf = SAXTransformerFactory.newInstance();
    stf.newTransformer().transform(new StreamSource(inputFile), 
     new StreamResult(outputStream));
}    
    //this analyzer counts the number of elements in sax stream
    public static class XMLAnalyzer extends XMLFilterImpl {
        int elementCount = 0;

        public XMLAnalyzer(XMLReader xmlReader) {
            super(xmlReader);
        }

        @Override
        public void startElement(String uri, String localName, String qName, 
          Attributes atts) throws SAXException {
            super.startElement(uri, localName, qName, atts);
            elementCount++;
        }
    }
}

Edit: Includes default JDK version

The most efficient would be an XMLWriter which implements ContentHandler. In nutshell, you are reading and writing from and to IO buffers. There is an XMLWriter in DOM4J which is being used below. You can either subclass XMLWriter or use XMLFilter to do analysis. I am using XMLFilter in this example. Note that XMLFilter is also a ContentHandler. Here is the complete code.

import org.dom4j.io.XMLWriter;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.PrintStream;

public class XMLPipeline {

    public static void main(String[] args) throws Exception {
        String inputFile = "build.xml";
        PrintStream outputStream = System.out;
        new XMLPipeline().pipe(inputFile, outputStream);
    }

//dom4j
public void pipe(String inputFile, OutputStream outputStream) throws
        SAXException, ParserConfigurationException, IOException {
    XMLWriter xwriter = new XMLWriter(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
    analyzer.setContentHandler(xwriter);
    analyzer.parse(inputFile);

    //do what you want with analyzer
    System.err.println(analyzer.elementCount);
}


//default JDK
public void pipeTrax(String inputFile, OutputStream outputStream) throws
        SAXException, ParserConfigurationException, 
        IOException, TransformerException {
    StreamResult xwriter = new StreamResult(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
    TransformerFactory stf = SAXTransformerFactory.newInstance();
    SAXSource ss = new SAXSource(analyzer, new InputSource(inputFile));
    stf.newTransformer().transform(ss, xwriter);
    System.out.println(analyzer.elementCount);
}

//This method simply reads from a file, runs it through SAX parser and dumps it 
//to dom4j writer
public void dom4jNoop(String inputFile, OutputStream outputStream) throws
        IOException, SAXException {
    XMLWriter xwriter = new XMLWriter(outputStream);
    XMLReader xreader = XMLReaderFactory.createXMLReader();
    xreader.setContentHandler(xwriter);
    xreader.parse(inputFile);

}

//Simplest way to read a file and write it back to an output stream
public void traxNoop(String inputFile, OutputStream outputStream) 
  throws TransformerException {
    TransformerFactory stf = SAXTransformerFactory.newInstance();
    stf.newTransformer().transform(new StreamSource(inputFile), 
     new StreamResult(outputStream));
}    
    //this analyzer counts the number of elements in sax stream
    public static class XMLAnalyzer extends XMLFilterImpl {
        int elementCount = 0;

        public XMLAnalyzer(XMLReader xmlReader) {
            super(xmlReader);
        }

        @Override
        public void startElement(String uri, String localName, String qName, 
          Attributes atts) throws SAXException {
            super.startElement(uri, localName, qName, atts);
            elementCount++;
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文