内存映射数据库
我有 8 TB 的数据,由约 5000 个小型元素数组组成(每个元素不到一百字节)。我需要将这些数组的各个部分(一次几十兆)加载到内存中,以便尽快在算法中使用。内存映射文件适合这种用途吗?如果不适合我还应该使用什么?
I have 8 terabytes of data composed of ~5000 arrays of small sized elements (under a hundred bytes per element). I need to load sections of these arrays (a few dozen megs at a time) into memory to use in an algorithm as quickly as possible. Are memory mapped files right for this use, and if not what else should I use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
鉴于您的要求,我肯定会选择内存映射文件。这几乎正是它们的用途。而且由于内存映射文件消耗的物理资源很少,与其他方法相比,您的超大文件对系统的影响很小,特别是因为较小的视图可以在执行 I/O 之前映射到地址空间(例如,那些数组)元素)。另一大好处是它们为您提供尽可能简单的工作环境。您(大多数情况下)可以将数据视为大内存地址空间,并让 Windows 处理 I/O。显然,您需要构建锁定机制来处理多个线程,但我相信您知道这一点。
Given your requirements I would definitely go with memory mapped files. It's almost exactly what they were made for. And since memory mapped files consume few physical resources, your extremely large files will have little impact on the system as compared to other methods, especially since smaller views can be mapped into the address space just before performing I/O (eg, those arrays of elements). The other big benefit is they give you the simplest working environment possible. You can (mostly) just view your data as a large memory address space and let Windows worry about the I/O. Obviously, you'll need to build in locking mechanisms to handle multiple threads, but I'm sure you know that.