是否可以对压缩文件进行内存映射?

发布于 2024-12-05 22:06:40 字数 276 浏览 1 评论 0原文

我们有一些带有 zlib 压缩二进制数据的大文件,我们希望将其映射到内存中。

是否有可能对这样的压缩二进制文件进行内存映射并以有效的方式访问这些字节?

我们是否最好只解压缩数据,将其映射到内存,然后在完成操作后再次压缩它?

编辑

我想我应该提到这些文件可以定期附加。

目前,磁盘上的数据通过 NSMutableData 加载并解压缩。然后我们对该数据进行一些任意的读/写操作。最后,在某个时刻,我们压缩数据并将其写回磁盘。

We have large files with zlib-compressed binary data that we would like to memory map.

Is it even possible to memory map such a compressed binary file and access those bytes in an effective manner?

Are we better off just decompressing the data, memory mapping it, then after we're done with our operations compress it again?

EDIT

I think I should probably mention that these files can be appended to at regular intervals.

Currently, this data on disk gets loaded via NSMutableData and decompressed. We then have some arbitrary read/write operations on this data. Finally, at some point we compress and write the data back to disk.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

羁客 2024-12-12 22:06:40

内存映射就是内存到磁盘的 1:1 映射。这与自动解压不兼容,因为它破坏了 1:1 映射。

我假设这些文件是只读的,因为随机访问写入压缩文件通常是不切实际的。因此,我假设这些文件在某种程度上是静态的。

我相信这是一个可以解决的问题,但它并不是微不足道的,您需要了解压缩格式。我不知道有任何易于重用的软件可以解决这个问题(尽管我确信很多人过去已经解决过类似的问题)。

您可以对文件进行内存映射,然后提供前端适配器接口以按给定的偏移量和长度获取字节。您将扫描一次文件,边解压边创建一个“目录”文件,将周期性标称偏移量映射到实际偏移量(这只是一种优化,您可以在获取数据时“发现”该目录) 。那么该算法将类似于:

  • 给定名义偏移量n,查找映射到小于n的最大实际偏移量m
  • m-32k读入缓冲区(32k是DEFLATE中允许的最大距离)。
  • m 开始 DEFLATE 算法。计算解压缩的字节数,直到达到 n

显然您想要缓存您的解决方案。 NSCacheNSPurgeableData 非常适合此目的。真正做好这一点并保持良好的性能将具有挑战性,但如果它是应用程序的关键部分,那么它可能非常有价值。

Memory mapping is all about the 1:1 mapping of memory to disk. That's not compatible with automatic decompression, since it breaks the 1:1 mapping.

I assume these files are read-only, since random-access writing to a compressed file is generally impractical. I would therefore assume that the files are somewhat static.

I believe this is a solvable problem, but it's not trivial, and you will need to understand the compression format. I don't know of any easily reusable software to solve it (though I'm sure many people have solved something like it in the past).

You could memory map the file and then provide a front-end adapter interface to fetch bytes at a given offset and length. You would scan the file once, decompressing as you went, and create a "table of contents" file that mapped periodic nominal offsets to real offset (this is just an optimization, you could "discover" this table of contents as you fetched data). Then the algorithm would look something like:

  • Given nominal offset n, look up greatest real offset m that maps to less than n.
  • Read m-32k into buffer (32k is the largest allowed distance in DEFLATE).
  • Begin DEFLATE algorithm at m. Count decompressed bytes until you get to n.

Obviously you'd want to cache your solutions. NSCache and NSPurgeableData are ideal for this. Doing this really well and maintaining good performance would be challenging, but if it's a key part of your application it could be very valuable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文