使用 JAXB 和 Stax 进行验证以编组 XML 文档

发布于 2024-08-25 11:04:38 字数 725 浏览 6 评论 0原文

我创建了一个 XML 模式 (foo.xsd) 并使用 xjc 为 JAXB 创建绑定类。假设根元素是collection,我正在编写N个document对象,它们是复杂类型。

因为我计划写出大型 XML 文件,所以我使用 Stax 写出 collection 根元素,并使用 JAXB 使用 Marshaller.marshal(JAXBElement, XMLEventWriter)。这是jaxb 的非官方用户指南推荐的方法。

我的问题是,如何在编组 XML 时验证它?如果我将模式绑定到 JAXB 编组器(使用 Marshaller.setSchema()),我会收到验证错误,因为我只编组子树(它抱怨它没有看到 集合 root 元素”)。我想我真正想做的是将架构绑定到 Stax XMLEventWriter 或类似的东西。

对这个整体方法的任何评论都会有所帮助。基本上我想成为能够使用 JAXB 编组和解组大型 XML 文档,而不会耗尽内存,因此如果有更好的方法来执行此操作,请告诉我。

I have created an XML schema (foo.xsd) and used xjc to create my binding classes for JAXB. Let's say the root element is collection and I am writing N document objects, which are complex types.

Because I plan to write out large XML files, I am using Stax to write out the collection root element, and JAXB to marshal document subtrees using Marshaller.marshal(JAXBElement, XMLEventWriter). This is the approach recommended by jaxb's unofficial user's guide.

My question is, how can I validate the XML while it's being marshalled? If I bind a schema to the JAXB marshaller (using Marshaller.setSchema()), I get validation errors because I am only marshalling a subtree (it's complaining that it's not seeing the collection root element"). I suppose what I really want to do is bind a schema to the Stax XMLEventWriter or something like that.

Any comments on this overall approach would be helpful. Basically I want to be able to use JAXB to marshal and unmarshal large XML documents without running out of memory, so if there's a better way to do this let me know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

内心旳酸楚 2024-09-01 11:04:38

一些 Stax 实现似乎能够验证输出。请参阅以下类似问题的答案:

将 Stax2 与 Woodstox 结合使用

Some Stax implementations seem to be able to validate output. See the following answer to a similar question:

Using Stax2 with Woodstox

后知后觉 2024-09-01 11:04:38

仅当 Marshaller 调用 Iterator.next() 时,您才可以使根集合惰性化并实例化项目。然后,对 marshal() 的一次调用将生成一个巨大的经过验证的 XML。您不会耗尽内存,因为已经序列化的 bean 会被 GC 收集。

另外,如果需要有条件地跳过,可以将 null 作为集合元素返回。不会有NPE。

即使在巨大的 XML 上,XML 模式验证器本身似乎也消耗很少的内存。

请参阅 JAXB 的 ArrayElementProperty.serializeListBody()

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;

import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {

    static final boolean MISPLACE_HEADER = true;

    private static final int LIST_SIZE = 20000;

    static final String HEADER = "Header";

    static final String DATA = "Data";

    @XmlElement(name = HEADER)
    String header;

    @XmlElement(name = DATA)
    List<String> data;

    @XmlAnyElement
    List<Object> content;

    public static void main(final String[] args) throws Exception {

        final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);

        final Schema schema = genSchema(jaxbContext);

        final Marshaller marshaller = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setSchema(schema);

        final TestHuge instance = new TestHuge();

        instance.content = new AbstractList<Object>() {

            @Override
            public Object get(final int index) {
                return instance.createChild(index);
            }

            @Override
            public int size() {
                return LIST_SIZE;
            }
        };

        // throws MarshalException ... Invalid content was found starting with element 'Header'
        marshaller.marshal(instance, new Writer() {

            @Override
            public void write(final char[] cbuf, final int off, final int len) throws IOException {}

            @Override
            public void write(final int c) throws IOException {}

            @Override
            public void flush() throws IOException {}

            @Override
            public void close() throws IOException {}
        });

    }

    private JAXBElement<String> createChild(final int index) {
        if (index % 1000 == 0) {
            System.out.println("serialized so far: " + index);
        }
        final String tag = index == getHeaderIndex(content) ? HEADER : DATA;

        final String bigStr = new String(new char[1000000]);
        return new JAXBElement<String>(new QName(tag), String.class, bigStr);
    }

    private static int getHeaderIndex(final List<?> list) {
        return MISPLACE_HEADER ? list.size() - 1 : 0;
    }

    private static Schema genSchema(final JAXBContext jc) throws Exception {
        final List<StringWriter> outs = new ArrayList<>();
        jc.generateSchema(new SchemaOutputResolver() {

            @Override
            public Result createOutput(final String namespaceUri, final String suggestedFileName)
                                                                                                  throws IOException {
                final StringWriter out = new StringWriter();
                outs.add(out);
                final StreamResult streamResult = new StreamResult(out);
                streamResult.setSystemId("");
                return streamResult;
            }
        });
        final StreamSource[] sources = new StreamSource[outs.size()];
        for (int i = 0; i < outs.size(); i++) {
            final StringWriter out = outs.get(i);
            sources[i] = new StreamSource(new StringReader(out.toString()));
        }
        final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final Schema schema = sf.newSchema(sources);
        return schema;
    }
}

You can make your root collection lazy and instantiate items only when the Marshaller calls Iterator.next(). Then a single call to marshal() will produce a huge validated XML. You won't run out of memory, because the beans that are already serialized get collected by GC.

Also, it's OK to return null as a collection element if it needs to be conditionally skipped. There won't be NPE.

The XML schema validator itself seems to consume little memory even on huge XMLs.

See JAXB's ArrayElementProperty.serializeListBody()

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;

import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {

    static final boolean MISPLACE_HEADER = true;

    private static final int LIST_SIZE = 20000;

    static final String HEADER = "Header";

    static final String DATA = "Data";

    @XmlElement(name = HEADER)
    String header;

    @XmlElement(name = DATA)
    List<String> data;

    @XmlAnyElement
    List<Object> content;

    public static void main(final String[] args) throws Exception {

        final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);

        final Schema schema = genSchema(jaxbContext);

        final Marshaller marshaller = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setSchema(schema);

        final TestHuge instance = new TestHuge();

        instance.content = new AbstractList<Object>() {

            @Override
            public Object get(final int index) {
                return instance.createChild(index);
            }

            @Override
            public int size() {
                return LIST_SIZE;
            }
        };

        // throws MarshalException ... Invalid content was found starting with element 'Header'
        marshaller.marshal(instance, new Writer() {

            @Override
            public void write(final char[] cbuf, final int off, final int len) throws IOException {}

            @Override
            public void write(final int c) throws IOException {}

            @Override
            public void flush() throws IOException {}

            @Override
            public void close() throws IOException {}
        });

    }

    private JAXBElement<String> createChild(final int index) {
        if (index % 1000 == 0) {
            System.out.println("serialized so far: " + index);
        }
        final String tag = index == getHeaderIndex(content) ? HEADER : DATA;

        final String bigStr = new String(new char[1000000]);
        return new JAXBElement<String>(new QName(tag), String.class, bigStr);
    }

    private static int getHeaderIndex(final List<?> list) {
        return MISPLACE_HEADER ? list.size() - 1 : 0;
    }

    private static Schema genSchema(final JAXBContext jc) throws Exception {
        final List<StringWriter> outs = new ArrayList<>();
        jc.generateSchema(new SchemaOutputResolver() {

            @Override
            public Result createOutput(final String namespaceUri, final String suggestedFileName)
                                                                                                  throws IOException {
                final StringWriter out = new StringWriter();
                outs.add(out);
                final StreamResult streamResult = new StreamResult(out);
                streamResult.setSystemId("");
                return streamResult;
            }
        });
        final StreamSource[] sources = new StreamSource[outs.size()];
        for (int i = 0; i < outs.size(); i++) {
            final StringWriter out = outs.get(i);
            sources[i] = new StreamSource(new StringReader(out.toString()));
        }
        final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final Schema schema = sf.newSchema(sources);
        return schema;
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文