Spring Batch Stax XML 读取作业在输入不足时不会结束

发布于 2024-12-10 06:43:46 字数 1219 浏览 0 评论 0原文

我正在使用 Spring Batch 设置一个作业来处理可能非常大的 XML 文件。我认为我已经正确设置了它,但在运行时我发现作业运行,处理其输入,然后挂起在执行状态(我可以通过查看 JobRepository 中的 JobExecution 状态来确认)。

我已多次阅读批处理文档,但没有看到任何明显的“输入不足时停止作业”配置。

这是我的应用程序上下文的相关部分:

<batch:job id="processPartnerUploads" restartable="true">
    <batch:step id="processStuffHoldings">
        <batch:tasklet>
            <batch:chunk reader="stuffReader" writer="stuffWriter" commit-interval="1"/>
        </batch:tasklet>        
    </batch:step>
</batch:job>

<bean id="stuffReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
  <property name="fragmentRootElementName" value="stuff" />
  <property name="resource" value="file:///path/to/file.xml" />
  <property name="unmarshaller" ref="stuffUnmarshaller" />
</bean>

<bean id="stuffUnmarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="contextPath" value="com.company.project.xmlcontext"/>
</bean>

<bean id="stuffWriter" class="com.company.project.batch.StuffWriter" />

如果重要的话,“StuffWriter”只是一个记录将要写入的项目的类。

如果我错过了与 Batch 和/或 Stax 相关的一些重要细微差别,请告诉我。

I'm using Spring Batch to set up a job that will process a potentially very large XML file. I think I've set it up appropriately, but at runtime I'm finding that the job runs, processes its input, and then just hangs in an executing state (I can confirm by viewing the JobExecution's status in the JobRepository).

I've read through the Batch documentation several times but I don't see any obvious "make the job stop when out of input" configuration that I'm missing.

Here's the relevant portion of my application context:

<batch:job id="processPartnerUploads" restartable="true">
    <batch:step id="processStuffHoldings">
        <batch:tasklet>
            <batch:chunk reader="stuffReader" writer="stuffWriter" commit-interval="1"/>
        </batch:tasklet>        
    </batch:step>
</batch:job>

<bean id="stuffReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
  <property name="fragmentRootElementName" value="stuff" />
  <property name="resource" value="file:///path/to/file.xml" />
  <property name="unmarshaller" ref="stuffUnmarshaller" />
</bean>

<bean id="stuffUnmarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="contextPath" value="com.company.project.xmlcontext"/>
</bean>

<bean id="stuffWriter" class="com.company.project.batch.StuffWriter" />

In case it matters, the "StuffWriter" is just a class that logs the items that would be written.

Please let me know if I've missed some important nuance involved with Batch and/or Stax.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

も星光 2024-12-17 06:43:46

我已经自己解决了这个问题,尽管我对自己必须做的事情感到惊讶。通过 StaxEventItemReader 进行调试,我注意到当到达文档末尾时,moveCursorToNextFragment() 方法中的内部循环将无限循环。这是相关代码:

while (true) {
    while (reader.peek() != null && !reader.peek().isStartElement()) {
        reader.nextEvent();
    }
    if (reader.peek() == null) {
        return false;
    }
    QName startElementName = ((StartElement) reader.peek()).getName();
    if (startElementName.getLocalPart().equals(fragmentRootElementName)) {
        if (fragmentRootElementNameSpace == null
    || startElementName.getNamespaceURI().equals(fragmentRootElementNameSpace)) {
           return true;
        }
     }
    reader.nextEvent();
 }

reader.peek() 从未返回 null。在我看来,这段代码应该检查 peek() 期间遇到的 XMLEvent 是否位于文档末尾,但这并不是那么简单,因为 StaxEventItemReader 依赖于包装标准 XMLEventReader 的 DefaultFragmentEventReader。

我最终要做的是基于 StaxEventItemReader 滚动我自己的 ItemReader,但根本不使用 FragmentEventReader,然后调整内部循环代码以如下所示读取:

        if (reader.peek().getEventType() == XMLStreamConstants.END_DOCUMENT) {
            return false;
        }
        reader.nextEvent();

这完美地工作并允许我的批处理作业在输入结束时转到 COMPLETED 。

不过,我真的很惊讶我必须这样做。我想知道我使用的流式 XML 库的底层实现是否有错误,但我使用的是 Spring Batch 依赖项列表中引用的 stax2-api-3.0.1.jar。

我还发现我并不孤单

I've resolved this problem for myself, though I'm surprised by what I had to do. Debugging through StaxEventItemReader, I noticed that the inner loop in the moveCursorToNextFragment() method would go infinite when the end of my document was reached. Here's the relevant code:

while (true) {
    while (reader.peek() != null && !reader.peek().isStartElement()) {
        reader.nextEvent();
    }
    if (reader.peek() == null) {
        return false;
    }
    QName startElementName = ((StartElement) reader.peek()).getName();
    if (startElementName.getLocalPart().equals(fragmentRootElementName)) {
        if (fragmentRootElementNameSpace == null
    || startElementName.getNamespaceURI().equals(fragmentRootElementNameSpace)) {
           return true;
        }
     }
    reader.nextEvent();
 }

reader.peek() was never returning null. It seemed to me like this code should be checking to see if the XMLEvent encountered during peek() is at the end of the document, but this wasn't so simple due to the StaxEventItemReader's reliance on a DefaultFragmentEventReader wrapping the standard XMLEventReader.

What I wound up doing was rolling my own ItemReader based on StaxEventItemReader but without using a FragmentEventReader at all, and then adjusting the inner loop code to read like so:

        if (reader.peek().getEventType() == XMLStreamConstants.END_DOCUMENT) {
            return false;
        }
        reader.nextEvent();

That works perfectly and allows my batch job to go to COMPLETED at the end of input.

I'm really surprised that I had to do this, though. I wondered if the underlying implementation of the streaming XML libraries I was using was at fault, but I'm using stax2-api-3.0.1.jar as referenced in the Spring Batch dependency list.

I also found that I'm not alone.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文