Java 中的内存映射 zip 文件
这是我试图解决的问题:
我有大约 100 个二进制文件(总共 158KB,它们的大小大致相同 +/- 50%)。我只需要有选择地解析其中的几个文件(最坏的情况下可能是 50 个,其他情况下只有 1 到 5 个)。顺便说一句,这是在 Android 设备上进行的。
在 Java 中执行此操作最快的方法是什么?
一种方法是将所有内容合并到一个文件中,然后使用文件查找来获取每个单独的文件。这样,文件打开只需要调用一次,而且通常很慢。但是,为了知道每个文件的位置,文件的开头需要有某种表(可以使用脚本生成),但文件也需要在表中建立索引将它们连接起来,这样文件查找就不必做太多工作(如果我错了,请纠正我)。
更好的方法是使文件内存映射,然后表不必按串联顺序排序,因为内存映射文件将具有随机访问权限(如果我错了,请再次纠正我)。
如果使用 zip 压缩,则无需创建该表,因为 zip 压缩已经创建了一个表。此外,不必连接所有文件。我可以压缩该目录,然后通过 zip 文件中的条目访问每个单独的文件。问题解决了。
除非 zip 文件没有内存映射,否则读取速度会更慢,因为系统调用比直接内存访问慢(如果我错了,请纠正我)。 因此我得出的结论是,最好的解决方案是使用内存映射的 zip 存档。
但是,ZipFile
条目返回一个 InputStream
读取条目的内容。并且MappedByteBuffer
需要一个RandomAccessFile
,它将文件名作为输入,而不是InputStream
。
有没有办法对 zip 文件进行内存映射以实现快速读取?或者对于读取选定文件的问题有不同的解决方案吗?
谢谢
编辑:我测试了打开、关闭和解析文件的速度,这是我发现的统计数据:
文件数:25(24用于解析,因为垃圾收集中断了计时)
总打开时间:72ms
总关闭时间:1ms
总解析时间:515ms
(这对 Parse 有利,因为 Parse 缺少文件)%打开所需的总时间:12%
完成交易所花费的总时间百分比:0.17%
%解析所需总时间:88%
打开每个文件所需的平均时间:2.88ms
每个文件的平均关闭时间:0.04ms
每个文件解析所需的平均时间:21.46ms
Here is the problem I'm trying to solve:
I have about 100 binary files (in total 158KB and they are roughly the same size +/- 50% of each other). I need to selectively parse only a few of these files (in the worst case maybe 50, in other cases as little as 1 to 5). This is on an Android device, by the way.
What is the fastest way to do this in Java?
One way could be combining everything into one file and then using file seek to get to the each individual file. That way file open would only need to be called once and that is usually slow. However, in order to know where each file is there would need to be some sort of table in the beginning of the file -- which could be generated using a script -- but the files would also need to be indexed in the table in the order that they were concatenated so file seek wouldn't have to do much work (correct me if I'm wrong).
A better way would be to make the file memory-mapped and then the table wouldn't have to be in sorted order of concatenation because the memory-mapped file would have random access (again correct me if I'm wrong).
Creating that table would be an unnecessary if zip compression was used because zip compression already makes a table. In addition, all the files wouldn't have to be concatenated. I could zip the directory and then access each of the individual files by their entries in the zip file. Problem solved.
Except if the zip file isn't memory-mapped, it will be slower to read, since system calls are slower than direct memory access (correct me if I'm wrong). So I came to the conclusion that the best solution would be to use a memory-mapped zip archive.
However, the ZipFile
entries return an InputStream
to read the contents of the entry. And the MappedByteBuffer
needs a RandomAccessFile
which takes a filename as input, not an InputStream
.
Is there anyway to memory-map a zip file for fast reads? Or is there a different solution to this problem of reading a selection of files?
Thanks
EDIT: I tested speeds of open, close, and parsing of the files here are the statistics that I found:
Number of Files: 25 (24 for parse because garbage collection interrupted timing)
Total Open Time: 72ms
Total Close Time: 1ms
Total Parse Time: 515ms
(this is skewed in Parse's favor because Parse is missing a file)%Total time Open takes: 12%
%Total time Close takes: 0.17%
%Total time Parse takes: 88%
Avg time Open takes per file: 2.88ms
Avg time Close takes per file: 0.04ms
Avg time Parse takes per file: 21.46ms
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我现在会使用一个简单的 api,例如 RandomAccessFile ,如果你真的需要这样做。
编辑 - 我不知道
MappedByteBuffer
。这似乎是要走的路。为什么不先对单独的文件执行此操作,然后再考虑将它们组合起来?I would use a simple api like RandomAccessFile for now and revisit the issue if you really need to.
Edit - I didn't know about
MappedByteBuffer
. That seems like the way to go. Why not do this with separate files first and then later think about combining them later?