从包含大量文件的 zip 文件中提取 1 个文件的最快方法是什么?

发布于 2024-10-27 22:04:35 字数 675 浏览 3 评论 0原文

我尝试了 java.util.zip 包,太慢了。

然后我找到了 LZMA SDK7z jbinding 但它们也缺少一些东西。

LZMA SDK没有提供一种如何使用的文档/教程,这非常令人沮丧。没有javadoc。

虽然 7z jbinding 不提供仅提取 1 个文件的简单方法,但它仅提供提取 zip 文件的所有内容的方法。此外,它没有提供指定放置解压缩文件的位置的方法。

有什么想法吗?

I tried the java.util.zip package, it is too slow.

Then I found LZMA SDK and 7z jbinding but they are also lacking something.

The LZMA SDK does not provide a kind of documentation/tutorial of how-to-use, it is very frustrating. No javadoc.

While the 7z jbinding does not provide a simple way to extract only 1 file, however, it only provide way to extract all the content of the zip file. Moreover, it does not provide a way to specify a location to place the unzipped file.

Any idea please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

成熟的代价 2024-11-03 22:04:35

您的 java.util.zip 代码是什么样的?您处理的 zip 文件有多大?

我能够在大约一秒钟内从包含 1,800 个条目的 200MB zip 文件中提取出 4MB 条目,如下所示:

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}

What does your code with java.util.zip look like and how big of a zip file are you dealing with?

I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}
以往的大感动 2024-11-03 22:04:35

我没有对速度进行基准测试,但使用 java 7 或更高版本,我按如下方式提取文件。
我想它比 ZipFile< 更快/a> API:

从 zip 文件 test.zip 中提取 META-INF/MANIFEST.MF 的简短示例:

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}

I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:

A short example extracting META-INF/MANIFEST.MF from a zip file test.zip:

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}
新雨望断虹 2024-11-03 22:04:35

使用 ZipFile 而不是 ZipInputStream

尽管文档没有指出这一点(它在 JarFile 的文档中),但它应该使用随机访问文件操作来读取文件。由于 ZIP 文件包含位于已知位置的目录,这意味着查找特定文件所需的 IO 量要少很多。

一些注意事项:据我所知,Sun 实现使用内存映射文件。这意味着您的虚拟地址空间必须足够大才能容纳文件以及 JVM 中的其他所有内容。对于 32 位服务器来说这可能是个问题。另一方面,它可能足够聪明,可以避免在 32 位上进行内存映射,或者仅对目录进行内存映射;我没试过。

另外,如果您使用多个文件,请务必使用 try/finally 以确保文件在使用后关闭。

Use a ZipFile rather than a ZipInputStream.

Although the documentation does not indicate this (it's in the docs for JarFile), it should use random-access file operations to read the file. Since a ZIPfile contains a directory at a known location, this means a LOT less IO has to happen to find a particular file.

Some caveats: to the best of my knowledge, the Sun implementation uses a memory-mapped file. This means that your virtual address space has to be large enough to hold the file as well as everything else in your JVM. Which may be a problem for a 32-bit server. On the other hand, it may be smart enough to avoid memory-mapping on 32-bit, or memory-map just the directory; I haven't tried.

Also, if you're using multiple files, be sure to use a try/finally to ensure that the file is closed after use.

Bonjour°[大白 2024-11-03 22:04:35

下面的代码片段假设您知道目标 zip 文件路径和其中的目标条目文件路径。

无需遍历文件,因为 ZipFile 提供了方法 getEntry 来直接检索条目以及获取 byte[] 或一个 FileInputStream 及其内容。

在此示例中,它在约 11 毫秒内从 zip 文件中读取约 340KB 的 protobuf 二进制文件。人们可以使用类似的方法来读取任何其他文件类型。


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

The code snippet below assumes you know both the target zip filepath and the target entry filepath inside it.

No need to iterate through the files as ZipFile provides a method getEntry to retrieve an entry directly as well as methods to get a byte[] or a FileInputStream with its contents.

In this example it reads a protobuf binary file with about 340KB from a zip file in ~11ms. One may use a similar approach to read any other file type.


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文