java.util.zip - ZipInputStream 与 ZipFile

发布于 2024-10-11 15:09:32 字数 1450 浏览 7 评论 0原文

我有一些关于 java.util.zip 库的一般性问题。我们基本上做的是进口和出口许多小部件。以前，这些组件是使用单个大文件导入和导出的，例如：

<component-type-a id="1"/>
<component-type-a id="2"/>
<component-type-a id="N"/>

<component-type-b id="1"/>
<component-type-b id="2"/>
<component-type-b id="N"/>

请注意，导入期间组件的顺序是相关的。

现在每个组件都应该占用自己的文件，该文件应该在外部 版本控制、QA 编辑、等等。我们决定导出的输出应该是一个 zip 文件（包含所有这些文件），导入的输入应该是一个类似的 zip 文件。我们不想破坏我们系统中的拉链。我们不想为每个小文件打开单独的流。我当前的问题：

Q1。 ZipInputStream 能否保证 zip 条目（小文件）的读取顺序与我们使用 ZipOutputStream 的导出插入的顺序相同？我假设阅读类似于：


ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) 
{
       //read from zis until available
}

我知道中央 zip 目录位于 zip 文件的末尾，但其中的文件条目仍然有顺序。我也知道依赖顺序是一个丑陋的想法，但我只想牢记所有事实。

Q2。如果我使用 ZipFile （我更喜欢），调用 getInputStream() 数百次会对性能产生什么影响？它会比 ZipInputStream 解决方案慢很多吗？ zip 仅打开一次，并且 ZipFile 由 RandomAccessFile 支持 - 这是正确的吗？我认为阅读类似于：


ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();//TODO: assure the order of the entries
while(e.hasMoreElements()) {
        entry = (ZipEntry) e.nextElement();
        is = zipfile.getInputStream(entry));
}

Q3。从同一个 ZipFile 线程检索的输入流是否安全（例如，我可以同时读取不同线程中的不同条目）吗？有任何性能处罚吗？

感谢您的回答！

原文

I have some general questions regarding the java.util.zip library.
What we basically do is an import and an export of many small components. Previously these components were imported and exported using a single big file, e.g.:

<component-type-a id="1"/>
<component-type-a id="2"/>
<component-type-a id="N"/>

<component-type-b id="1"/>
<component-type-b id="2"/>
<component-type-b id="N"/>

Please note that the order of the components during import is relevant.

Now every component should occupy its own file which should be externally versioned, QA-ed, bla, bla. We decided that the output of our export should be a zip file (with all these files in) and the input of our import should be a similar zip file. We do not want to explode the zip in our system. We do not want opening separate streams for each of the small files. My current questions:

Q1. May the ZipInputStream guarantee that the zip entries (the little files) will be read in the same order in which they were inserted by our export that uses ZipOutputStream? I assume reading is something like:


ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) 
{
       //read from zis until available
}

I know that the central zip directory is put at the end of the zip file but nevertheless the file entries inside have sequential order. I also know that relying on the order is an ugly idea but I just want to have all the facts in mind.

Q2. If I use ZipFile (which I prefer) what is the performance impact of calling getInputStream() hundreds of times? Will it be much slower than the ZipInputStream solution? The zip is opened only once and ZipFile is backed by RandomAccessFile - is this correct?
I assume reading is something like:


ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();//TODO: assure the order of the entries
while(e.hasMoreElements()) {
        entry = (ZipEntry) e.nextElement();
        is = zipfile.getInputStream(entry));
}

Q3. Are the input streams retrieved from the same ZipFile thread safe (e.g. may I read different entries in different threads simultaneously)? Any performance penalties?

Thanks for your answers!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

感情废物 2024-10-18 15:09:32

Q1：是的，顺序将与添加条目的顺序相同。

Q2：请注意，由于 zip 存档文件的结构和压缩，没有一个解决方案是完全流式传输的；他们都做了一定程度的缓冲。如果您查看 JDK 源代码，就会发现实现共享大部分代码。尽管索引确实允许查找与条目相对应的块，但对内容内没有真正的随机访问。所以我认为不应该存在有意义的性能差异；特别是操作系统无论如何都会缓存磁盘块。您可能只想测试性能以通过简单的测试用例来验证这一点。

Q3：我不会指望这一点；但很可能他们不是。如果您确实认为并发访问会有所帮助（主要是因为解压缩受 CPU 限制，所以它可能会有所帮助），我会尝试读取内存中的整个文件，通过 ByteArrayInputStream 公开，并构造多个独立的读取器。

回复收藏 0 原文

抽个烟儿 2024-10-18 15:09:32

我测量发现，仅使用 ZipInputStream 列出文件比使用 ZipFile 慢 8 倍。

    long t = System.nanoTime();
    ZipFile zip = new ZipFile(jarFile);
    Enumeration<? extends ZipEntry> entries = zip.entries();
    while (entries.hasMoreElements())
    {
        ZipEntry entry = entries.nextElement();

        String filename = entry.getName();
        if (!filename.startsWith(JAR_TEXTURE_PATH))
            continue;

        textureFiles.add(filename);
    }
    zip.close();
    System.out.println((System.nanoTime() - t) / 1e9);

并且

    long t = System.nanoTime();
    ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile));
    ZipEntry entry;
    while ((entry = zip.getNextEntry()) != null)
    {
        String filename = entry.getName();
        if (!filename.startsWith(JAR_TEXTURE_PATH))
            continue;

        textureFiles.add(filename);
    }
    zip.close();
    System.out.println((System.nanoTime() - t) / 1e9);

（不要在同一个类中运行它们。创建两个不同的类并分别运行它们）

I measured that just listing the files with ZipInputStream is 8 times slower than with ZipFile.

    long t = System.nanoTime();
    ZipFile zip = new ZipFile(jarFile);
    Enumeration<? extends ZipEntry> entries = zip.entries();
    while (entries.hasMoreElements())
    {
        ZipEntry entry = entries.nextElement();

        String filename = entry.getName();
        if (!filename.startsWith(JAR_TEXTURE_PATH))
            continue;

        textureFiles.add(filename);
    }
    zip.close();
    System.out.println((System.nanoTime() - t) / 1e9);

and

    long t = System.nanoTime();
    ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile));
    ZipEntry entry;
    while ((entry = zip.getNextEntry()) != null)
    {
        String filename = entry.getName();
        if (!filename.startsWith(JAR_TEXTURE_PATH))
            continue;

        textureFiles.add(filename);
    }
    zip.close();
    System.out.println((System.nanoTime() - t) / 1e9);

(Don't run them in the same class. Make two different classes and run them separately)

回复收藏 0 原文