Ant XSLT 任务的文件集内存不足/未释放内存

发布于 2024-11-03 08:55:08 字数 3334 浏览 3 评论 0原文

我有一个大的 (1.9 GB) XML 文件,其中包含我想每月插入 MySQL 数据库的数据。我为此编写了一个 Ant 脚本。

Ant XSLT 任务无法处理这么大的一个文件,因此我有一项任务使用 xml_split(来自 xml-twig-tools)将 1.9 GB xml 文件拆分为大约 4 MB 的较小 xml 文件。

这一切都很顺利。

我使用以下 Ant xml 对所有这些 XML 文件运行 XSLT 任务:

    <target name="xsltransform" depends="split" description="Transform XML to SQL...">
            <xslt basedir="${import.dir}/" 
                  destdir="${import.dir}/sql/"
                  style="${xsl.filename}" force="true">
                    <mapper type="glob" from="*.xml" to="*.sql" />
                    <factory name="net.sf.saxon.TransformerFactoryImpl"/>
            </xslt>
    </target>

问题是,一旦在第一个 XML 文件上启动,我就会看到 linux top 中的“RES”内存不断增长与每个下一个 XML 文件。由于它正在处理多个(不相关的)xml 文件,我怀疑它会在每个 xml 文件的翻译之间释放内存。好吧,事实并非如此……在 200 个 4MB 的 xml 文件之后,java 抛出内存不足异常:

BUILD FAILED
/var/lib/hudson/jobs/EPDB_Rebuild_Monthly/workspace/trunk/buildfiles/buildMonthly.xml:67: java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tinytree.TinyTree.ensureNodeCapacity(Unknown Source)
at net.sf.saxon.tinytree.TinyTree.addNode(Unknown Source)
at net.sf.saxon.tinytree.TinyBuilder.startElement(Unknown Source)
at net.sf.saxon.event.Stripper.startElement(Unknown Source)
at net.sf.saxon.event.ReceivingContentHandler.startElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Unknown Source)
at net.sf.saxon.event.Sender.send(Unknown Source)
at net.sf.saxon.event.Sender.send(Unknown Source)
at net.sf.saxon.Controller.transform(Unknown Source)
at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194)
at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:812)
at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:408)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360)
at org.apache.tools.ant.Project.executeTarget(Project.java:1329)

我可以做些什么来防止 XSLT 任务耗尽我所有的内存吗? 或者我应该重新考虑我的方法?

I have a big (1.9 GB) XML file which has data I want to insert into a MySQL database every month. I have made an Ant script for this.

The Ant XSLT task can't handle one file this big, so I have a task that uses xml_split (from xml-twig-tools) to split the 1.9 GB xml file into smaller xml files of roughly 4 MB.

This all goes well.

I use the following Ant xml to run the XSLT task over all these XML files:

    <target name="xsltransform" depends="split" description="Transform XML to SQL...">
            <xslt basedir="${import.dir}/" 
                  destdir="${import.dir}/sql/"
                  style="${xsl.filename}" force="true">
                    <mapper type="glob" from="*.xml" to="*.sql" />
                    <factory name="net.sf.saxon.TransformerFactoryImpl"/>
            </xslt>
    </target>

The problem is, as soon as it starts on the first XML file, I see the 'RES' memory in linux top growing with every next XML file. As it is processing multiple (unrelated) xml files, I would suspect it would free up memory in between the translation of each xml file. Well, it doesn't... after two-hundred 4MB xml files, java throws an out-of-memory exception:

BUILD FAILED
/var/lib/hudson/jobs/EPDB_Rebuild_Monthly/workspace/trunk/buildfiles/buildMonthly.xml:67: java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tinytree.TinyTree.ensureNodeCapacity(Unknown Source)
at net.sf.saxon.tinytree.TinyTree.addNode(Unknown Source)
at net.sf.saxon.tinytree.TinyBuilder.startElement(Unknown Source)
at net.sf.saxon.event.Stripper.startElement(Unknown Source)
at net.sf.saxon.event.ReceivingContentHandler.startElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Unknown Source)
at net.sf.saxon.event.Sender.send(Unknown Source)
at net.sf.saxon.event.Sender.send(Unknown Source)
at net.sf.saxon.Controller.transform(Unknown Source)
at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194)
at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:812)
at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:408)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360)
at org.apache.tools.ant.Project.executeTarget(Project.java:1329)

Is there something I can do to prevent the XSLT task eating up all my memory?
Or should I reconsider my approach?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

杀手六號 2024-11-10 08:55:08

我们都同意它应该释放内存,但由于它没有释放内存,您可以尝试将 xslt 任务分解为单独的调用。例如,使用 Ant Contrib 的 for 任务

<for param="file">
    <fileset dir="${import.dir}"/>
    <sequential>
        <xslt in="@{file}"
              destdir="${import.dir}/sql/"
              style="${xsl.filename}" force="true">
                <mapper type="glob" from="*.xml" to="*.sql" />
                <factory name="net.sf.saxon.TransformerFactoryImpl"/>
        </xslt>
    </sequential>
</for>

如果这并不能解决问题,那么由于您使用的是 Saxon,您可以 在分叉的 JVM 中直接调用 Saxon 的 java 类。例如,

<java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:${import.dir}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql" />
</java>

或者您可以同时尝试两者

<for param="file">
    <fileset dir="${import.dir}"/>
    <sequential>
        <basename property="@{file}.base" file="@{file}" suffix="xml"/>
        <java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:@{file}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" />
        </java>
    </sequential>
</for>

,为了获得奖励积分,您可以尝试通过并行执行来加快速度。

<for param="file">
    <fileset dir="${import.dir}"/>
    <parallel>
        <basename property="@{file}.base" file="@{file}" suffix="xml"/>
        <java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:@{file}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" />
        </java>
    </parallel>
</for>

We are all going to agree that it should be letting go of the memory, but since it doesn't, you can try breaking up the xslt task in to seperate calls. e.g., using Ant Contrib's for task

<for param="file">
    <fileset dir="${import.dir}"/>
    <sequential>
        <xslt in="@{file}"
              destdir="${import.dir}/sql/"
              style="${xsl.filename}" force="true">
                <mapper type="glob" from="*.xml" to="*.sql" />
                <factory name="net.sf.saxon.TransformerFactoryImpl"/>
        </xslt>
    </sequential>
</for>

If that doesn't do the trick, then since you are using Saxon, you can calling Saxon's java classes directly in a forked JVM. e.g.,

<java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:${import.dir}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql" />
</java>

or you can try both

<for param="file">
    <fileset dir="${import.dir}"/>
    <sequential>
        <basename property="@{file}.base" file="@{file}" suffix="xml"/>
        <java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:@{file}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" />
        </java>
    </sequential>
</for>

and for bonus points you could try to speed things up a bit by doing it in parallel.

<for param="file">
    <fileset dir="${import.dir}"/>
    <parallel>
        <basename property="@{file}.base" file="@{file}" suffix="xml"/>
        <java classname="net.sf.saxon.Transform" failonerror="true" fork="true">
                <arg value="-s:@{file}" />
                <arg value="-xsl:${xsl.filename}" />
                <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" />
        </java>
    </parallel>
</for>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文