Java 中的内存映射 zip 文件

发布于 2024-10-21 06:10:49 字数 1304 浏览 2 评论 0原文

这是我试图解决的问题：

我有大约 100 个二进制文件（总共 158KB，它们的大小大致相同 +/- 50%）。我只需要有选择地解析其中的几个文件（最坏的情况下可能是 50 个，其他情况下只有 1 到 5 个）。顺便说一句，这是在 Android 设备上进行的。

在 Java 中执行此操作最快的方法是什么？

一种方法是将所有内容合并到一个文件中，然后使用文件查找来获取每个单独的文件。这样，文件打开只需要调用一次，而且通常很慢。但是，为了知道每个文件的位置，文件的开头需要有某种表（可以使用脚本生成），但文件也需要在表中建立索引将它们连接起来，这样文件查找就不必做太多工作（如果我错了，请纠正我）。

更好的方法是使文件内存映射，然后表不必按串联顺序排序，因为内存映射文件将具有随机访问权限（如果我错了，请再次纠正我）。

如果使用 zip 压缩，则无需创建该表，因为 zip 压缩已经创建了一个表。此外，不必连接所有文件。我可以压缩该目录，然后通过 zip 文件中的条目访问每个单独的文件。问题解决了。

除非 zip 文件没有内存映射，否则读取速度会更慢，因为系统调用比直接内存访问慢（如果我错了，请纠正我）。 因此我得出的结论是，最好的解决方案是使用内存映射的 zip 存档。

但是，ZipFile 条目返回一个 InputStream读取条目的内容。并且MappedByteBuffer需要一个RandomAccessFile，它将文件名作为输入，而不是InputStream。

有没有办法对 zip 文件进行内存映射以实现快速读取？或者对于读取选定文件的问题有不同的解决方案吗？

谢谢

编辑：我测试了打开、关闭和解析文件的速度，这是我发现的统计数据：

文件数：25（24用于解析，因为垃圾收集中断了计时）
总打开时间：72ms
总关闭时间：1ms
总解析时间：515ms

（这对 Parse 有利，因为 Parse 缺少文件）
%打开所需的总时间：12%
完成交易所花费的总时间百分比：0.17%
%解析所需总时间：88%

打开每个文件所需的平均时间：2.88ms
每个文件的平均关闭时间：0.04ms
每个文件解析所需的平均时间：21.46ms

原文

Here is the problem I'm trying to solve:

I have about 100 binary files (in total 158KB and they are roughly the same size +/- 50% of each other). I need to selectively parse only a few of these files (in the worst case maybe 50, in other cases as little as 1 to 5). This is on an Android device, by the way.

What is the fastest way to do this in Java?

One way could be combining everything into one file and then using file seek to get to the each individual file. That way file open would only need to be called once and that is usually slow. However, in order to know where each file is there would need to be some sort of table in the beginning of the file -- which could be generated using a script -- but the files would also need to be indexed in the table in the order that they were concatenated so file seek wouldn't have to do much work (correct me if I'm wrong).

A better way would be to make the file memory-mapped and then the table wouldn't have to be in sorted order of concatenation because the memory-mapped file would have random access (again correct me if I'm wrong).

Creating that table would be an unnecessary if zip compression was used because zip compression already makes a table. In addition, all the files wouldn't have to be concatenated. I could zip the directory and then access each of the individual files by their entries in the zip file. Problem solved.

Except if the zip file isn't memory-mapped, it will be slower to read, since system calls are slower than direct memory access (correct me if I'm wrong). So I came to the conclusion that the best solution would be to use a memory-mapped zip archive.

However, the ZipFile entries return an InputStream to read the contents of the entry. And the MappedByteBuffer needs a RandomAccessFile which takes a filename as input, not an InputStream.

Is there anyway to memory-map a zip file for fast reads? Or is there a different solution to this problem of reading a selection of files?

Thanks

EDIT: I tested speeds of open, close, and parsing of the files here are the statistics that I found:

Number of Files: 25 (24 for parse because garbage collection interrupted timing)
Total Open Time: 72ms
Total Close Time: 1ms
Total Parse Time: 515ms

(this is skewed in Parse's favor because Parse is missing a file)
%Total time Open takes: 12%
%Total time Close takes: 0.17%
%Total time Parse takes: 88%

Avg time Open takes per file: 2.88ms
Avg time Close takes per file: 0.04ms
Avg time Parse takes per file: 21.46ms

分享到QQ

分享到微博