用 Java 解压 Gzip 存档
我正在尝试用 Java 解压缩 gzip 格式的大约 8000 个文件。 我的第一次尝试是使用 GZIPInputStream 但性能很糟糕。
有人知道解压 gzip 档案的替代方法吗? 我尝试了 ZipInputStream 但它无法识别 gzip 格式。
先感谢您。
I'm trying to decompress about 8000 files in gzip format in Java. My first try was to use GZIPInputStream but the performance was awful.
Anyone know any alternative to decompress gzip archives? I tried ZipInputStream but it's not recognizing the gzip format.
Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要使用缓冲。 写入小块数据效率很低。 压缩实现是在 Sun JDK 的本机代码中实现的。 即使没有缓冲,性能通常也应该超过合理的文件或网络 I/O。
由于本机代码用于实现解压缩/压缩算法,因此在使用后要非常小心地关闭流(而不仅仅是底层流)。 我发现大量“Deflaters”闲置对性能非常不利。
ZipInputStream 处理文件存档,这与压缩流完全不同。
You need to use buffering. Writing small pieces of data is going to be inefficient. The compression implementation is in native code in the Sun JDK. Even if it wasn't the buffered performance should usually exceed reasonable file or network I/O.
As native code is used to implement the decompression/compression algorithm, be very careful to close the stream (and not just the underlying stream) after use. I've found having loads of `Deflaters' hanging around is very bad for performance.
ZipInputStream
deals with archives of files, which is a completely different thing from compressing a stream.当您说
GZipInputStream
的性能很糟糕时,您能说得更具体一些吗? 你有没有发现是CPU瓶颈还是I/O瓶颈? 您是否在输入和输出上都使用了缓冲? 如果您可以发布您正在使用的代码,那将非常有帮助。如果您使用的是多核计算机,您可以尝试仍然使用 GZipInputStream,但使用多个线程(每个核心一个),并且仍需要处理共享文件队列。 (任何一个文件只能由一个线程处理。)如果您受 I/O 限制,这可能会让事情变得更糟,但可能值得一试。
When you say that
GZipInputStream
's performance was awful, could you be more specific? Did you find out whether it was a CPU bottleneck or an I/O bottleneck? Were you using buffering on both input and output? If you could post the code you were using, that would be very helpful.If you're on a multi-core machine, you could try still using
GZipInputStream
but using multiple threads, one per core, with a shared queue of files still to process. (Any one file would only be processed by a single thread.) That might make things worse if you're I/O bound, but it may be worth a try.对于这种规模,假设您的平台要求有限,您可能希望采用本机。 您可以使用 JNI 调用库或使用 ProcessBuilder 调用本机命令。
For that kind of scale, you might want to go native, assuming your platform requirements are limited. You can use JNI to call a library or invoke a native command using
ProcessBuilder
.