从包含大量文件的 zip 文件中提取 1 个文件的最快方法是什么?
我尝试了 java.util.zip 包,太慢了。
然后我找到了 LZMA SDK 和 7z jbinding 但它们也缺少一些东西。
LZMA SDK没有提供一种如何使用的文档/教程,这非常令人沮丧。没有javadoc。
虽然 7z jbinding 不提供仅提取 1 个文件的简单方法,但它仅提供提取 zip 文件的所有内容的方法。此外,它没有提供指定放置解压缩文件的位置的方法。
有什么想法吗?
I tried the java.util.zip package, it is too slow.
Then I found LZMA SDK and 7z jbinding but they are also lacking something.
The LZMA SDK does not provide a kind of documentation/tutorial of how-to-use, it is very frustrating. No javadoc.
While the 7z jbinding does not provide a simple way to extract only 1 file, however, it only provide way to extract all the content of the zip file. Moreover, it does not provide a way to specify a location to place the unzipped file.
Any idea please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您的
java.util.zip
代码是什么样的?您处理的 zip 文件有多大?我能够在大约一秒钟内从包含 1,800 个条目的 200MB zip 文件中提取出 4MB 条目,如下所示:
What does your code with
java.util.zip
look like and how big of a zip file are you dealing with?I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:
我没有对速度进行基准测试,但使用 java 7 或更高版本,我按如下方式提取文件。
我想它比 ZipFile< 更快/a> API:
从 zip 文件
test.zip
中提取META-INF/MANIFEST.MF
的简短示例:I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:
A short example extracting
META-INF/MANIFEST.MF
from a zip filetest.zip
:使用 ZipFile 而不是 ZipInputStream。
尽管文档没有指出这一点(它在 JarFile 的文档中),但它应该使用随机访问文件操作来读取文件。由于 ZIP 文件包含位于已知位置的目录,这意味着查找特定文件所需的 IO 量要少很多。
一些注意事项:据我所知,Sun 实现使用内存映射文件。这意味着您的虚拟地址空间必须足够大才能容纳文件以及 JVM 中的其他所有内容。对于 32 位服务器来说这可能是个问题。另一方面,它可能足够聪明,可以避免在 32 位上进行内存映射,或者仅对目录进行内存映射;我没试过。
另外,如果您使用多个文件,请务必使用
try
/finally
以确保文件在使用后关闭。Use a ZipFile rather than a ZipInputStream.
Although the documentation does not indicate this (it's in the docs for
JarFile
), it should use random-access file operations to read the file. Since a ZIPfile contains a directory at a known location, this means a LOT less IO has to happen to find a particular file.Some caveats: to the best of my knowledge, the Sun implementation uses a memory-mapped file. This means that your virtual address space has to be large enough to hold the file as well as everything else in your JVM. Which may be a problem for a 32-bit server. On the other hand, it may be smart enough to avoid memory-mapping on 32-bit, or memory-map just the directory; I haven't tried.
Also, if you're using multiple files, be sure to use a
try
/finally
to ensure that the file is closed after use.下面的代码片段假设您知道目标 zip 文件路径和其中的目标条目文件路径。
无需遍历文件,因为
ZipFile
提供了方法getEntry
来直接检索条目以及获取byte[]
或一个FileInputStream
及其内容。在此示例中,它在约 11 毫秒内从 zip 文件中读取约 340KB 的 protobuf 二进制文件。人们可以使用类似的方法来读取任何其他文件类型。
The code snippet below assumes you know both the target zip filepath and the target entry filepath inside it.
No need to iterate through the files as
ZipFile
provides a methodgetEntry
to retrieve an entry directly as well as methods to get abyte[]
or aFileInputStream
with its contents.In this example it reads a protobuf binary file with about 340KB from a zip file in ~11ms. One may use a similar approach to read any other file type.