随机访问容器不适合内存？

发布于 2024-08-19 03:37:00 字数 882 浏览 15 评论 0原文

我有一个对象数组（例如图像），它太大而无法放入内存（例如 40GB）。但我的代码需要能够在运行时随机访问这些对象。

最好的方法是什么？

当然，从我的代码的角度来看，如果某些数据位于磁盘上或临时存储在内存中，那应该没有关系；它应该具有透明的访问权限：

container.getObject(1242)->process();
container.getObject(479431)->process();

但是我应该如何实现这个容器？它应该只将请求发送到数据库吗？如果是这样，哪一个是最好的选择？（如果是数据库，那么它应该是免费的，并且没有太多的管理麻烦，也许是 Berkeley DB 或 sqlite？）

我应该自己实现它，在访问后记忆对象并在内存满时清除内存吗？或者有没有好的库（C++）可以实现这一点？

对容器的要求是最大限度地减少磁盘访问（某些元素可能会被我的代码更频繁地访问，因此它们应该保留在内存中）并允许快速访问。

更新：我发现 STXXL 不适用于我的问题，因为我存储在容器中的对象具有动态大小，即我的代码可能会在运行时更新它们（增加或减少某些对象的大小）。但 STXXL 无法处理这个问题：

STXXL 容器假设数据他们存储的类型是普通的旧数据类型（POD）。 http://algo2.iti.kit.edu/dementiev/stxxl/报告/node8.html

您能否评论一下其他解决方案？使用数据库怎么样？哪一个？

原文

I have an array of objects (say, images), which is too large to fit into memory (e.g. 40GB). But my code needs to be able to randomly access these objects at runtime.

What is the best way to do this?

From my code's point of view, it shouldn't matter, of course, if some of the data is on disk or temporarily stored in memory; it should have transparent access:

container.getObject(1242)->process();
container.getObject(479431)->process();

But how should I implement this container? Should it just send the requests to a database? If so, which one would be the best option? (If a database, then it should be free and not too much administration hassle, maybe Berkeley DB or sqlite?)

Should I just implement it myself, memoizing objects after acces sand purging the memory when it's full? Or are there good libraries (C++) for this out there?

The requirements for the container would be that it minimizes disk access (some elements might be accessed more frequently by my code, so they should be kept in memory) and allows fast access.

UPDATE: I turns out that STXXL does not work for my problem because the objects I store in the container have dynamic size, i.e. my code may update them (increasing or decreasing the size of some objects) at runtime. But STXXL cannot handle that: