重载 new 运算符以将对象存储在 mmap 文件中

发布于 2024-10-11 14:07:42 字数 520 浏览 5 评论 0原文

我有一个 Linux C++ 程序,需要相当大的内存。大部分内存仅被少数类消耗,并且访问频率相当低。我想将这些类从主内存移动到基于磁盘的存储,同时尽可能少地更改现有代码。

这个想法是重写这些对象的 new 运算符,并将它们分配到 mmap() 的内存区域中。这样我的代码修改就非常有限,程序的其余部分可以愉快地访问这些对象,而无需知道任何更改,并且内核将确保我需要的对象位于内存中,而其他对象位于磁盘上。我知道这与交换的工作方式非常相似,但交换分区通常对于我的程序需要来说太小。

我有一些问题:

  • 这是一个非常糟糕的主意吗?您知道有什么更好的方法可以实现同样的目标吗?
  • 我是否需要预先分配最大文件大小,是否需要在磁盘上分配所有这些空间?如果是这样,映射到稀疏文件会有帮助吗?
  • 我不想编写自己的堆分配器。我可以使用现有的吗?
  • 当我的程序完成时,mmap 文件将被删除。这意味着我不希望将任何页面写入磁盘,除非内核实际上将它们从内存中删除。是否有类似于 mmap 的惰性标志来实现此目的,或者这是自动的?

I have a Linux C++ program with fairly large memory requirements. Most of the memory is consumed by just a few classes, and is accessed reasonably infrequent. I want to move these classes from main memory to disk-based storage, while changing as little existing code as possible.

The idea was to override the new operator for these objects and have them allocated into an mmap()'d memory region. This way my code modifications stay very limited, the rest of the program can happily access these objects without knowing that anything changed, and the kernel will make sure the objects I need are in memory while the others are on disk. I know this is very similar as to how the swap works, but the swap partition is usually too small for what my program needs.

Some questions I have:

  • Is this a very bad idea? Do you know something better to achieve the same?
  • Would I need to allocate the maximum file size beforehand, and will I require all of this space to be allocated on disk? If so, would mapping to a sparse file help?
  • I don't want to write my own heap allocator. Can I use an existing one?
  • When my program finishes, the mmap'd file will be deleted. This means I don't want any pages to be written to disk unless the kernel will actually remove them from memory. Is there something like a lazy flag to mmap to achieve this, or is this automatic?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小霸王臭丫头 2024-10-18 14:07:42

依次查看每个问题

  • 这是一个非常糟糕的主意吗?您是否知道更好的方法来实现相同的目标?

目前还不清楚您希望通过此实现什么目的。 Linux 已经支持交换空间使用的内存(因此,如果您的数据超过物理内存,一些数据将被交换到磁盘)。您是否遇到地址空间不足或由于过多分页而运行缓慢的问题?使用 mmap 支持的存储也不会产生真正的影响。

  • 我是否需要预先分配最大文件大小,以及我是否需要在磁盘上分配所有这些空间?如果是这样,映射到稀疏文件会有帮助吗?

是的,您需要文件与映射的空间一样大。但是,您可以从一个小文件/mmap 开始,然后根据需要增大文件(以及 mmap 附加页面)。您还可以使用稀疏文件,以便在写入页面之前不会使用磁盘空间。

  • 我不想编写自己的堆分配器。我可以使用现有的吗?

有些堆管理器使用 mmap 支持的存储。我见过 Doug Lea malloc 的版本,以及其他各种这样做的 bibop 分配器。

  • 当我的程序完成时,mmap 文件将被删除。这意味着我不希望将任何页面写入磁盘,除非内核实际上将它们从内存中删除。是否有类似 mmap 的惰性标志来实现此目的,或者这是自动的?

在这种情况下,您可以只使用 MAP_ANON 而根本没有文件。然而,这又回到了第一个问题,因为这本质上是重复系统 malloc (和 new)所做的事情。事实上,在某些操作系统(Solaris?)上,这正是系统 malloc 的作用。我过去看到基于 mmap 的自定义 malloc 的主要原因是为了持久存储(因此文件将在进程退出后保留并在重新启动时重新映射)。

Looking at each question in turn

  • Is this a very bad idea? Do you know something better to achieve the same?

Its not really clear what you hope to achieve by this. Linux already backs memory used by swap space (so if your data exceeds physical memory, some will be swapped to disk). Are you having problems with running out of address space, or running slowly due to excessive paging? Using an mmap backed store won't really affect either.

  • Would I need to allocate the maximum file size beforehand, and will I require all of this space to be allocated on disk? If so, would mapping to a sparse file help?

Yes, you need the file to be as big as the space you are mmaping. You can however start with a small file/mmap and grow the file (and mmap additional pages) later as needed. You can also use a sparse file, so that disk space isn't used until the pages are written to.

  • I don't want to write my own heap allocator. Can I use an existing one?

There are heap managers that use mmap-backed storage. I've seen versions of the Doug Lea malloc, and various other bibop allocators that do so.

  • When my program finishes, the mmap'd file will be deleted. This means I don't want any pages to be written to disk unless the kernel will actually remove them from memory. Is there something like a lazy flag to mmap to achieve this, or is this automatic?

In this case, you could just use MAP_ANON and not have a file at all. However, this gets back to the first question, as this is essentially duplicating what the system malloc (and new) does. In fact on some OSes (Solaris?) that's exactly what the system malloc does. The main reason I've seen custom mmap-based mallocs in the past is for persistent storage (so the file would remain after the process exits and would be remapped on restart).

独享拥抱 2024-10-18 14:07:42

我可以想到您想要采取的方法的一些问题,所以这还不是答案。

  1. 当您“交换”某些东西时,即您面临的问题是它消耗了太多内存来保留对象,所以什么时候删除它们(有效地取消映射)?即做出与操作系统的内存管理器相同的决定?
  2. 尽管您可以将类的二进制表示形式存储在 mmaped 块中,但如果该类不是 POD,则“交换”过程将不会执行您所期望的操作(例如,如果存在堆成员)已分配 - 它们会发生什么?)
  3. mmap 的内存仍然会影响您的进程,因此,您的问题不会消失...

我认为您最好的选择是查看您的设计并考虑何时需要这些类以及持续多久。不需要时就建造、使用和丢弃——它们的建造成本高吗?也许它们序列化到一些本地文件并重建会更便宜(当我说序列化时,我的意思不仅仅是内存复制!)

I can think of a few problems with the approach you would like to take, so this isn't an answer yet.

  1. When you do "swap" something out, i.e. the problem you are facing is that it's consuming too much memory to keep the objects around, so when do you remove them (effectively unmap)? i.e. make the same decision that the memory manager of you OS makes?
  2. Though you may be able to store the binary representation of the class in a mmaped block, if the class is not a POD, then the process of "swapping" will not do what you expect (for example, if there are members which are heap allocated - what happens to them?)
  3. mmap'd memory will still count against your process, as such, your problems will not go away...

I think your best bet here is to look at your design and consider when these classes are needed and for how long. And construct, use and discard when not needed - are they expensive to construct? May be they would be cheaper to serialize into some local file and reconstruct (when I say serialize, I mean not simply mem copy!)

初与友歌 2024-10-18 14:07:42

最好的选择可能是指定您的程序需要配置最小数量的交换,而不是尝试使用 mmap() 模拟更多交换。特别是,您的最后一点无法真正克服 - 文件支持的映射中的脏页通常会优先写出。

The best option is likely to be to specify that your program requires a minimum amount of swap to be configured, rather than trying to simulate more swap using mmap(). In particular, your last point can't really be overcome - dirty pages in file-backed mappings are generally preferentially written out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文