使用 Java 将大文件压缩为 ZIP
我需要通过 Java 类 ZipOutputStream 压缩一个大文件(~450 MB)。这个大尺寸导致我的 JVM 堆空间出现“OutOfMemory”错误问题。发生这种情况是因为“zos.write(...)”方法在压缩之前将要压缩的所有文件内容存储在内部字节数组中。
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(filePath);
zos.putNextEntry(entry);
int count;
while ((count = origin.read(data, 0, BUFFER)) != -1)
{
zos.write(data, 0, count);
}
origin.close();
自然的解决方案是扩大JVM的堆内存空间,但我想知道是否有一种方法可以以流式方式写入这些数据。我不需要高压缩率,所以我也可以改变算法。
有人对此有什么想法吗?
I have the need to compress a one Big file (~450 Mbyte) through the Java class ZipOutputStream. This big dimension causes a problem of "OutOfMemory" error of my JVM Heap Space. This happens because the "zos.write(...)" method stores ALL the file content to compress in an internal byte array before compressing it.
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(filePath);
zos.putNextEntry(entry);
int count;
while ((count = origin.read(data, 0, BUFFER)) != -1)
{
zos.write(data, 0, count);
}
origin.close();
The natural solution will be to enlarge the heap memory space of the JVM, but I would like to know if there is a method to write this data in a streaming manner. I do not need an high compression rate so I could change the algorithm too.
does anyone have an idea about it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据您对 Sam 回复的评论,您显然创建了一个 ZipOutputStream,它包装了 ByteArrayOutputStream。 ByteArrayOutputStream 当然将压缩结果缓存在内存中。如果您希望将其写入磁盘,则必须将 ZipOutputStream 包装在 FileOutputStream 周围。
According to your comment to Sam's response, you have obviously created a ZipOutputStream, which wraps a ByteArrayOutputStream. The ByteArrayOutputStream of course caches the compressed result in memory. If you want it written to disk, you have to wrap the ZipOutputStream around a FileOutputStream.
有一个名为 TrueZip 的库,我过去使用它并取得了很好的成功做这种事。
我不能保证它在缓冲方面做得更好。我确实知道它用自己的编码完成了很多事情,而不是依赖于 JDK 的 Zip API。
所以我认为值得一试。
There's a library called TrueZip that I've used with good success in the past to do this kind of thing.
I cannot guarantee it does better on the buffering front. I do know that it does a lot of stuff with its own coding rather than depending on the JDK's Zip API.
So it's worth a try, in my opinion.
ZipOutputStream 是基于流的,它不占用内存。你的BUFFER可能太大了。
ZipOutputStream is stream-based, it doesn't hold onto memory. Your BUFFER may be too large.
我想知道是否是因为您将内容存储在 ZipEntry 中,也许它基本上在写出 ZipEntry 之前加载了所有内容。您必须使用 Zip 吗?如果您需要压缩的只是一个数据流,您可能会查看 GZIPOutputStream。我相信不会出现同样的问题。
希望这有帮助。
I wonder if it's because you are storing the content in a ZipEntry, perhaps it basically loads all of its content before writing out the ZipEntry. Do you have to use Zip? If it's just one data stream you need to compress you might look into the GZIPOutputStream instead. I believe that it would not have the same problem.
Hope this helps.