EDI到XML巨大的文件转换

发布于 2025-01-22 04:19:48 字数 7037 浏览 2 评论 0原文

我正在将EDI文件转换为XML。但是,我的输入文件也恰好在BIF中,大约是100MB,这给了我Java的记忆错误。

我试图查阅Smook的文档以进行巨大的文件转换,但是它是从XML转换为EDI。

以下是我在运行主管时得到的响应,

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
        at java.lang.StringBuffer.append(StringBuffer.java:367)
        at java.io.StringWriter.write(StringWriter.java:94)
        at java.io.Writer.write(Writer.java:127)
        at freemarker.core.TextBlock.accept(TextBlock.java:56)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visitByHiddingParent(Environment.java:278)
        at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
        at freemarker.core.Environment.visitIteratorBlock(Environment.java:501)
        at freemarker.core.IteratorBlock.accept(IteratorBlock.java:67)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Macro$Context.runMacro(Macro.java:173)
        at freemarker.core.Environment.visit(Environment.java:686)
        at freemarker.core.UnifiedCall.accept(UnifiedCall.java:80)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Environment.process(Environment.java:235)
        at freemarker.template.Template.process(Template.java:262)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:92)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:86)
        at org.milyn.event.report.HtmlReportGenerator.applyTemplate(HtmlReportGenerator.java:76)
        at org.milyn.event.report.AbstractReportGenerator.processFinishEvent(AbstractReportGenerator.java:197)
        at org.milyn.event.report.AbstractReportGenerator.processLifecycleEvent(AbstractReportGenerator.java:157)
        at org.milyn.event.report.AbstractReportGenerator.onEvent(AbstractReportGenerator.java:92)
        at org.milyn.Smooks._filter(Smooks.java:558)
        at org.milyn.Smooks.filterSource(Smooks.java:482)
        at com.***.xfunctional.EdiToXml.runSmooksTransform(EdiToXml.java:40)
        at com.***.xfunctional.EdiToXml.main(EdiToXml.java:57)

import java.io.*;
import java.util.Arrays;
import java.util.Locale;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.container.ExecutionContext;
import org.milyn.event.report.HtmlReportGenerator;
import org.milyn.io.StreamUtils;
import org.milyn.payload.StringResult;
import org.milyn.payload.SystemOutResult;
import org.xml.sax.SAXException;

public class EdiToXml {

  private static byte[] messageIn = readInputMessage();

  protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {

    Locale defaultLocale = Locale.getDefault();
    Locale.setDefault(new Locale("en", "EN"));

    // Instantiate Smooks with the config...
    Smooks smooks = new Smooks("smooks-config.xml");
    try {
      // Create an exec context - no profiles....
      ExecutionContext executionContext = smooks.createExecutionContext();

      StringResult result = new StringResult();

      // Configure the execution context to generate a report...
      executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));

      // Filter the input message to the outputWriter, using the execution context...
      smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)),result);

      Locale.setDefault(defaultLocale);

      return result.getResult();
    } finally {
      smooks.close();
    }
  }

  public static void main(String[] args) throws IOException, SAXException, SmooksException {
    System.out.println("\n\n==============Message In==============");
    System.out.println("======================================\n");

    pause(
        "The EDI input stream can be seen above.  Press 'enter' to see this stream transformed into XML...");

    String messageOut = EdiToXml.runSmooksTransform();

    System.out.println("==============Message Out=============");
    System.out.println(messageOut);
    System.out.println("======================================\n\n");

    pause("And that's it!  Press 'enter' to finish...");
  }

  private static byte[] readInputMessage() {
    try {
      InputStream input = new BufferedInputStream(new FileInputStream("/home/****/Downloads/BifInputFile.DATA"));
      return StreamUtils.readStream(input);
    } catch (IOException e) {
      e.printStackTrace();
      return "<no-message/>".getBytes();
    }
  }

  private static void pause(String message) {
    try {
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      System.out.print("> " + message);
      in.readLine();
    } catch (IOException e) {
    }
    System.out.println("\n");
  }

}

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
  <!--
     Configure the EDI Reader to parse the message stream into a stream of SAX events.
     -->
  <edi:reader mappingModel="edi-to-xml-bif-mapping.xml" validate="false"/>
</smooks-resource-list>

我在代码中编辑了这一行以反映流的用法: -

smooks.filterSource(executionContext, new StreamSource(new FileInputStream("/home/***/Downloads/sample-text-file.txt")), result);

但是,我现在以下面的方式将其视为错误。有人猜测最好的方法是什么?

Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at ****.EdiToXml.runSmooksTransform(EdiToXml.java:41)
    at com.***.***.EdiToXml.main(EdiToXml.java:58)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:504)
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:453)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:428)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:386)
    at org.milyn.smooks.edi.EDIReader.parse(EDIReader.java:111)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    ... 5 more

I am converting an EDI file to XML. However my input file which happens to also be in BIF is approximately 100Mb is giving me a JAVA out of memory error.

I tried to consult Smook's Documentation for the huge file conversion, however it is a conversion from XML to EDI.

Below is the response I am getting when running my main

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
        at java.lang.StringBuffer.append(StringBuffer.java:367)
        at java.io.StringWriter.write(StringWriter.java:94)
        at java.io.Writer.write(Writer.java:127)
        at freemarker.core.TextBlock.accept(TextBlock.java:56)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visitByHiddingParent(Environment.java:278)
        at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
        at freemarker.core.Environment.visitIteratorBlock(Environment.java:501)
        at freemarker.core.IteratorBlock.accept(IteratorBlock.java:67)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Macro$Context.runMacro(Macro.java:173)
        at freemarker.core.Environment.visit(Environment.java:686)
        at freemarker.core.UnifiedCall.accept(UnifiedCall.java:80)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Environment.process(Environment.java:235)
        at freemarker.template.Template.process(Template.java:262)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:92)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:86)
        at org.milyn.event.report.HtmlReportGenerator.applyTemplate(HtmlReportGenerator.java:76)
        at org.milyn.event.report.AbstractReportGenerator.processFinishEvent(AbstractReportGenerator.java:197)
        at org.milyn.event.report.AbstractReportGenerator.processLifecycleEvent(AbstractReportGenerator.java:157)
        at org.milyn.event.report.AbstractReportGenerator.onEvent(AbstractReportGenerator.java:92)
        at org.milyn.Smooks._filter(Smooks.java:558)
        at org.milyn.Smooks.filterSource(Smooks.java:482)
        at com.***.xfunctional.EdiToXml.runSmooksTransform(EdiToXml.java:40)
        at com.***.xfunctional.EdiToXml.main(EdiToXml.java:57)

import java.io.*;
import java.util.Arrays;
import java.util.Locale;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.container.ExecutionContext;
import org.milyn.event.report.HtmlReportGenerator;
import org.milyn.io.StreamUtils;
import org.milyn.payload.StringResult;
import org.milyn.payload.SystemOutResult;
import org.xml.sax.SAXException;

public class EdiToXml {

  private static byte[] messageIn = readInputMessage();

  protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {

    Locale defaultLocale = Locale.getDefault();
    Locale.setDefault(new Locale("en", "EN"));

    // Instantiate Smooks with the config...
    Smooks smooks = new Smooks("smooks-config.xml");
    try {
      // Create an exec context - no profiles....
      ExecutionContext executionContext = smooks.createExecutionContext();

      StringResult result = new StringResult();

      // Configure the execution context to generate a report...
      executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));

      // Filter the input message to the outputWriter, using the execution context...
      smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)),result);

      Locale.setDefault(defaultLocale);

      return result.getResult();
    } finally {
      smooks.close();
    }
  }

  public static void main(String[] args) throws IOException, SAXException, SmooksException {
    System.out.println("\n\n==============Message In==============");
    System.out.println("======================================\n");

    pause(
        "The EDI input stream can be seen above.  Press 'enter' to see this stream transformed into XML...");

    String messageOut = EdiToXml.runSmooksTransform();

    System.out.println("==============Message Out=============");
    System.out.println(messageOut);
    System.out.println("======================================\n\n");

    pause("And that's it!  Press 'enter' to finish...");
  }

  private static byte[] readInputMessage() {
    try {
      InputStream input = new BufferedInputStream(new FileInputStream("/home/****/Downloads/BifInputFile.DATA"));
      return StreamUtils.readStream(input);
    } catch (IOException e) {
      e.printStackTrace();
      return "<no-message/>".getBytes();
    }
  }

  private static void pause(String message) {
    try {
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      System.out.print("> " + message);
      in.readLine();
    } catch (IOException e) {
    }
    System.out.println("\n");
  }

}

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
  <!--
     Configure the EDI Reader to parse the message stream into a stream of SAX events.
     -->
  <edi:reader mappingModel="edi-to-xml-bif-mapping.xml" validate="false"/>
</smooks-resource-list>

I edited this line in the code to reflect the usage of a stream :-

smooks.filterSource(executionContext, new StreamSource(new FileInputStream("/home/***/Downloads/sample-text-file.txt")), result);

However I now have this below as error. Anybody any guess what is the best approach ?

Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at ****.EdiToXml.runSmooksTransform(EdiToXml.java:41)
    at com.***.***.EdiToXml.main(EdiToXml.java:58)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:504)
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:453)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:428)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:386)
    at org.milyn.smooks.edi.EDIReader.parse(EDIReader.java:111)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    ... 5 more

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凝望流年 2025-01-29 04:19:48

消息是有效的,XML映射很好。我只是没有使用最佳方法来读取和写作。

我开始意识到Smooks的FiltersOrce方法可以直接用InputStream&amp;输出流作为变量。请在下面找到导致程序有效运行的代码,而无需遇到Java内存错误。

//Instantiate a FileInputStream
FileInputStream inputStream = new FileInputStream(inputFileName);

//Instantiate an FileOutputStream
FileOutputStream outputStream = new FileOutputStream(outputFileName);


try {    

  // Filter the input message to the outputWriter...
  smooks.filterSource(new StreamSource(inputStream), new StreamResult(outputStream));

  Locale.setDefault(defaultLocale);

} finally {
  smooks.close();
  inputStream.close();
  outputStream.close();
}

感谢社区。

问候。

The message was valid and the xml mapping was good. I was just not using the optimal method for message reading and writing.

I came to realize the filterSource method of Smooks can directly be fed with an InputStream & OutputStream as variables. Kindly find below the piece of code that led to an efficient running of the program without going through JAVA memory error.

//Instantiate a FileInputStream
FileInputStream inputStream = new FileInputStream(inputFileName);

//Instantiate an FileOutputStream
FileOutputStream outputStream = new FileOutputStream(outputFileName);


try {    

  // Filter the input message to the outputWriter...
  smooks.filterSource(new StreamSource(inputStream), new StreamResult(outputStream));

  Locale.setDefault(defaultLocale);

} finally {
  smooks.close();
  inputStream.close();
  outputStream.close();
}

Thanks to the community.

Regards.

塔塔猫 2025-01-29 04:19:48

我是Smooks的原始作者,也是Edifact解析的东西。杰森(Jason)给我发了电子邮件,要求提供有关此建议的建议,但是我已经有很多年没有参与了,所以不确定我会有多帮助。

Smooks没有将完整消息读取到内存中。它通过解析器将其转换为SAX事件流,使其“看起来像” XML到下游的任何内容。如果然后使用这些事件在男性中构建一个大Java对象模型,则可能会导致OOM错误等。

查看异常消息,看起来Edifact输入看起来不符合所使用的定义文件。

Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.

这些EDIFACT定义文件最初是由Edifact组发布的定义直接生成的,但我确实记得许多人“调整”消息格式,这似乎在这里可能发生的事情(因此上述错误)。一种解决方案是调整预先生成的定义以匹配。

我知道,过去一两年中,在这个领域的Smooks(使用Apache Daffodil作为定义)已经进行了许多更改,但我不是最好的人谈论这一点。您可以尝试使用Smooks邮件列表以寻求帮助。

I'm the original author of Smooks and that Edifact parsing stuff. Jason emailed me asking for advice on this but I haven't been involved in it for a number of years now, so not sure how helpful I’d be.

Smooks doesn’t read the full message into memory. It streams it though a parser that converts it to a stream of SAX events, making it “look like” XML to anything downstream of it. If those events are then used to build a big Java object model in men then that might result in OOM errors etc.

Looking at the Exception message, it simply looks like the EDIFACT input doesn’t match the definition file being used.

Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.

Those EDIFACT definition files were originally generated directly from the definitions published by the EDIFACT group, but I do remember that many people “tweak” the message formats, which seems like what might be happening here (and hence the above error). One solution to that would be to tweak the pre-generated definitions to match.

I know that a lot of changes have been made in Smooks in this area in the last year or two (using Apache Daffodil for the definitions) but I wouldn’t be the best person to talk about that. You can try the Smooks mailing list for help on that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文