从 JAXP SAX ContentHandler 发出 XML 的最节省内存的方法是什么?
我的情况类似于 关于发出 XML 的早期问题。我正在分析 SAX ContentHandler 中的数据,同时将其序列化为流。我怀疑链接问题中的解决方案(尽管这正是我在 API 方面寻找的解决方案)内存效率不高,因为它涉及使用 XSLT 处理器进行身份转换。我希望程序的内存消耗受到限制,而不是随着输入大小的增加而增加。
如何轻松地将 ContentHandler 方法的参数转发到序列化器,而无需进行杂技操作以将 StAX 适配为 SAX,或者更糟糕的是,将 SAX 事件内容复制到输出流?
编辑:这是我所追求的一个最小的例子。 thingIWant
应该只写入给定的 OutputStream。正如我所说,前面的问题有一个 TransformerHandler,它为我提供了正确的 API,但它使用 XSLT 处理器而不仅仅是简单的序列化。
public class MyHandler implements ContentHandler {
ContentHandler thingIWant;
MyHandler(OutputStream outputStream) {
thingIWant = setup(outputStream);
}
public void startDocument() throws SAXException {
// parsing logic
thingIWant.startDocument();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
// parsing logic
thingIWant.startElement(uri, localName, qName, atts);
}
public void characters(char[] ch, int start, int length) throws SAXException {
// parsing logic
thingIWant.characters(ch, start, length);
}
// etc...
}
I have a situation similar to an earlier question about emitting XML. I am analyzing data in a SAX ContentHandler while serializing it to a stream. I am suspicious that the solution in the linked question -- though it is exactly what I am looking for in terms of the API -- is not memory-efficient, since it involves an identity transform with the XSLT processor. I want the memory consumption of the program to be bounded, rather than it growing with the input size.
How can I easily forward the parameters to my ContentHandler methods to a serializer without doing acrobatics to adapt e.g. StAX to SAX, or worse yet, copying the SAX event contents to the output stream?
Edit: here's a minimal example of what I am after. thingIWant
should just write to the OutputStream given to it. Like I said, the earlier question has a TransformerHandler that gives me the right API, but it uses the XSLT processor instead of just a simple serialization.
public class MyHandler implements ContentHandler {
ContentHandler thingIWant;
MyHandler(OutputStream outputStream) {
thingIWant = setup(outputStream);
}
public void startDocument() throws SAXException {
// parsing logic
thingIWant.startDocument();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
// parsing logic
thingIWant.startElement(uri, localName, qName, atts);
}
public void characters(char[] ch, int start, int length) throws SAXException {
// parsing logic
thingIWant.characters(ch, start, length);
}
// etc...
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我最近遇到了类似的问题。这是我为获得 thingIWant 而编写的类:
基本上,它拦截 Transformer 对 parse() 的调用,并获取对其内部 ContentHandler 的引用。之后,该类充当被捕获的 ContentHandler 的代理。
不是很干净,但是可以用。
I recently had a similar problem. Here is the class I wrote to get you thingIWant:
Basically, it intercepts the Transformer's call to parse(), and grabs a reference to its internal ContentHandler. After that, the class acts as a proxy to the snagged ContentHandler.
Not very clean, but it works.
第一:不用担心身份转换;它不会构建数据的内存表示。
要实现“tee”功能,您必须创建一个内容处理程序,用于侦听解析器生成的事件流,并将它们传递给转换器为您提供的处理程序。不幸的是,这并不像听起来那么容易:解析器想要将事件发送到 DefaultHandler,而转换器想要从 XMLReader。前者是抽象类,后者是接口。 JDK 还提供了类 XMLFilterImpl< /a>,它实现了
DefaultHandler
的所有接口,但不扩展它......这就是将两个不同的项目合并为“参考实现”所得到的结果。因此,您需要在两者之间编写一个桥接类:
main
方法设置变压器。有趣的部分是SAXSource
是围绕MyReader
构建的。当转换器准备好处理事件时,它将调用该对象的parse()
方法,并向其传递指定的InputSource
。接下来的部分并不明显:
XMLFilterImpl
遵循装饰器模式。转换器将在开始转换之前调用该对象的各种 setter 方法,并传递其自己的处理程序。我不重写的任何方法(例如,startDocument()
)都会简单地调用委托。作为覆盖示例,我在startElement()
中进行“分析”(只是 println)。您可能会重写其他ContentHandler
方法。最后,
XMLFilterBridge
是DefaultHandler
和XmlReader
之间的桥梁;它也是一个装饰器,每个方法都简单地调用委托。我展示了一项覆盖,但您必须完成所有这些。First: don't worry about the identity transform; it does not build an in-memory representation of the data.
To implement your "tee" functionality, you have to create a content handler that listens to the stream of events produced by the parser, and passes them on to the handler provided for you by the transformer. Unfortunately, this is not as easy as it sounds: the parser wants to send events to a DefaultHandler, while the transformer wants to read events from an XMLReader. The former is an abstract class, the latter is an interface. The JDK also provides the class XMLFilterImpl, which implements all of the interfaces of
DefaultHandler
, but does not extend from it ... that's what you get for incorporating two different projects as your "reference implementations."So, you need to write a bridge class between the two:
The
main
method sets up the transformer. The interesting part is that theSAXSource
is constructed aroundMyReader
. When the transformer is ready for events, it will call theparse()
method ofthat object, passing it the specifiedInputSource
.The next part is not obvious:
XMLFilterImpl
follows the Decorator pattern. The transformer will call various setter methods on this object before starting the transform, passing its own handlers. Any methods that I don't override (eg,startDocument()
) will simply call the delegate. As an example override, I'm doing "analysis" (just a println) instartElement()
. You'll probably override otherContentHandler
methods.And finally,
XMLFilterBridge
is the bridge betweenDefaultHandler
andXmlReader
; it's also a decorator, and every method simply calls the delegate. I show one override, but you'll have to do them all.编辑:包括默认的 JDK 版本
最有效的是实现
ContentHandler
的XMLWriter
。简而言之,您正在从 IO 缓冲区读取和写入。 DOM4J 中有一个正在使用的 XMLWriter以下。您可以继承XMLWriter
或使用XMLFilter
进行分析。我在此示例中使用XMLFilter
。请注意,XMLFilter
也是一个ContentHandler
。这是完整的代码。Edit: Includes default JDK version
The most efficient would be an
XMLWriter
which implementsContentHandler
. In nutshell, you are reading and writing from and to IO buffers. There is an XMLWriter in DOM4J which is being used below. You can either subclassXMLWriter
or useXMLFilter
to do analysis. I am usingXMLFilter
in this example. Note thatXMLFilter
is also aContentHandler
. Here is the complete code.