在java中从xml解码大型base64:OutOfMemory

发布于 2024-11-04 06:04:30 字数 377 浏览 1 评论 0原文

我需要将 xml 文件的 base64 编码元素写入单独的文件中。问题:文件的大小很容易达到 100 MB。我尝试的每个解决方案都以“java.lang.OutOfMemoryError:Java堆空间”结束。问题不在于一般的读取xml或解码过程,而在于base64块的大小。

我使用 jdom、dom4j 和 XMLStreamReader 来访问 xml 文件。但是,一旦我想访问相应元素的 base64 内容,我就会收到上述错误。我还尝试使用 saxon 的 base64Binary-to-octets 函数进行 xslt,但结果当然是相同的。

有没有一种方法可以将这个 Base64 编码部分流式传输到文件中,而无需将整个块分成一个部分?

感谢您的提示,

安德烈亚斯

I need to write a base64 encoded element of an xml file into a separate file. Problem: the file could easily reach the size of 100 MB. Every solution I tried ended with the "java.lang.OutOfMemoryError: Java heap space". The problem is not reading the xml in general or the decoding process, but the size of the base64 block.

I used jdom, dom4j and XMLStreamReader to access the xml file. However, as soon as I want to access the base64 content of the respective element I get the mentioned error. I also tried an xslt using saxon's base64Binary-to-octets function, but of course with the same result.

Is there a way to stream this base64 encoded part into a file without getting the whole chunk in one single piece?

Thanks for your hints,

Andreas

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

深居我梦 2024-11-11 06:04:30

Apache Commons Codec 有一个 Base64OutputStream,它应该允许您以可扩展的方式提供 XML 数据方式,通过将 Base64OutputStreamFileOutputStream 链接起来。

您需要将 XML 表示为字符串,因此您甚至可能根本不需要将其读入 DOM 结构。

类似于:

PrintWriter printWriter = new PrintWriter(
   new Base64OutputStream(
      new BufferedOutputStream(
         new FileOutputStream("/path/to/my/file")
      )
   )
);
printWriter.write(myXml);
printWriter.close();

如果输入 XML 文件太大,那么您应该循环将其块读入缓冲区,并将缓冲区内容写入输出(即标准的读取器到写入器副本)。

Apache Commons Codec has a Base64OutputStream, which should allow you to feed the XML data in a scalable way, by chaining the Base64OutputStream with a FileOutputStream.

You'll need a representation of the XML as a String, so you may not even have to read it into a DOM structure at all.

Something like:

PrintWriter printWriter = new PrintWriter(
   new Base64OutputStream(
      new BufferedOutputStream(
         new FileOutputStream("/path/to/my/file")
      )
   )
);
printWriter.write(myXml);
printWriter.close();

If the input XML file is too big, then you should read chunks of it into a buffer in a loop, writing the buffer contents to the output (i.e. a standard reader-to-writer copy).

无悔心 2024-11-11 06:04:30

我认为任何 XML api 都不会让您以流而不是字符串的形式访问元素的文本。如果字符串是 100 MB,那么您唯一的选择可能是更改 JVM 的堆大小,直到没有任何 OutOfMemoryError :

java -Xmx256m your.class.Name

I don't think any XML api would let you access an element's text as a stream rather than a String. If the String is 100 MB, then your only option is probably to change the JVM's heap size until you don't have any OutOfMemoryError :

java -Xmx256m your.class.Name
祁梦 2024-11-11 06:04:30

尝试 StAX API(教程)。对于大型文本元素,您应该获得多个文本事件,您需要将它们推送到流式 Base64 实现中(如提到的 skaffman)。

Try the StAX API (tutorial). For large text elements, you should get several text events which you need to push into a streaming Base64 implementation (like the one skaffman mentioned).

懵少女 2024-11-11 06:04:30

如果您的文件可以变得那么大,请不要使用 DOM 解析器。使用简单的 SAX 方法来访问数据元素,并将 Base64 数据流式传输到 Base64OutputStream 中,如上所述。

If your file can get that big, never use a DOM parser. Use a simple SAX approach to access the data elements, and stream the base64 data into Base64OutputStream as mentioned above.

苏别ゝ 2024-11-11 06:04:30

正如 lbruder 所说,使用 SAX 解析器以流方式读取文档。如果您使用 Base64OutputStream 那么您必须设置标志以使其解码而不是默认的编码。您还必须将字符数组从字符回调转换为字节数组,然后再将其传递到输出流,这需要额外的内存分配和副本。

我为这个用例编写了一个替代的base64解码器,它可以在 github.以下是有关如何使用它的示例:

Base64StreamDecoder decoder = new Base64StreamDecoder();
OutputStream out;

...

public void startElement(String uri, String localName, String qName, Attributes atts) {
    decoder.reset();
    out = new BufferedOutputStream(new FileOutputStream(...));
}

public void endElement(String uri, String localName, String qName) {
    decoder.checkComplete();
    out.close();
}

public void characters(char[] ch, int start, int length) {
    decoder.decode(ch, start, length, out);
}

As lbruder said, use a SAX parser to read the document in a streaming fashion. If you use Base64OutputStream then you have to set the flag to let it DECODE instead of the default ENCODE. You also have to convert the char array from the characters callback to a byte array before passing it to the outputstream, needing additional memory allocations and copies.

I wrote an alternative base64 decoder for exactly this usecase, it is available at github. Here is an example on how to use it:

Base64StreamDecoder decoder = new Base64StreamDecoder();
OutputStream out;

...

public void startElement(String uri, String localName, String qName, Attributes atts) {
    decoder.reset();
    out = new BufferedOutputStream(new FileOutputStream(...));
}

public void endElement(String uri, String localName, String qName) {
    decoder.checkComplete();
    out.close();
}

public void characters(char[] ch, int start, int length) {
    decoder.decode(ch, start, length, out);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文