当前位置：文江博客话题详情

从包含大量文件的 zip 文件中提取 1 个文件的最快方法是什么？

发布于 2024-10-27 22:04:35 字数 675 浏览 3 评论 0原文

我尝试了 java.util.zip 包，太慢了。

然后我找到了 LZMA SDK 和 7z jbinding 但它们也缺少一些东西。

LZMA SDK没有提供一种如何使用的文档/教程，这非常令人沮丧。没有javadoc。

虽然 7z jbinding 不提供仅提取 1 个文件的简单方法，但它仅提供提取 zip 文件的所有内容的方法。此外，它没有提供指定放置解压缩文件的位置的方法。

有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

成熟的代价 2024-11-03 22:04:35

您的 java.util.zip 代码是什么样的？您处理的 zip 文件有多大？

我能够在大约一秒钟内从包含 1,800 个条目的 200MB zip 文件中提取出 4MB 条目，如下所示：

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}

What does your code with java.util.zip look like and how big of a zip file are you dealing with?

I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}

回复收藏 0 原文

以往的大感动 2024-11-03 22:04:35

我没有对速度进行基准测试，但使用 java 7 或更高版本，我按如下方式提取文件。
我想它比 ZipFile< 更快/a> API：

从 zip 文件 test.zip 中提取 META-INF/MANIFEST.MF 的简短示例：

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}

I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:

A short example extracting META-INF/MANIFEST.MF from a zip file test.zip:

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}

回复收藏 0 原文

新雨望断虹 2024-11-03 22:04:35

使用 ZipFile 而不是 ZipInputStream。

尽管文档没有指出这一点（它在 JarFile 的文档中），但它应该使用随机访问文件操作来读取文件。由于 ZIP 文件包含位于已知位置的目录，这意味着查找特定文件所需的 IO 量要少很多。

一些注意事项：据我所知，Sun 实现使用内存映射文件。这意味着您的虚拟地址空间必须足够大才能容纳文件以及 JVM 中的其他所有内容。对于 32 位服务器来说这可能是个问题。另一方面，它可能足够聪明，可以避免在 32 位上进行内存映射，或者仅对目录进行内存映射；我没试过。

另外，如果您使用多个文件，请务必使用 try/finally 以确保文件在使用后关闭。

回复收藏 0 原文

Bonjour°[大白 2024-11-03 22:04:35

下面的代码片段假设您知道目标 zip 文件路径和其中的目标条目文件路径。

无需遍历文件，因为 ZipFile 提供了方法 getEntry 来直接检索条目以及获取 byte[] 或一个 FileInputStream 及其内容。

在此示例中，它在约 11 毫秒内从 zip 文件中读取约 340KB 的 protobuf 二进制文件。人们可以使用类似的方法来读取任何其他文件类型。


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

The code snippet below assumes you know both the target zip filepath and the target entry filepath inside it.

No need to iterate through the files as ZipFile provides a method getEntry to retrieve an entry directly as well as methods to get a byte[] or a FileInputStream with its contents.

In this example it reads a protobuf binary file with about 340KB from a zip file in ~11ms. One may use a similar approach to read any other file type.


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

回复收藏 0 原文

~没有更多了~