改进分配器算法实现的建议
我有一个 Visual Studio 2008 C++ 应用程序,其中使用标准容器的自定义分配器,以便它们的内存来自内存映射文件而不是堆。该分配器用于 4 种不同的用例:
- 104 字节固定大小结构
std::vector< SomeType,MyAllocator<某种类型> > foo;
- 200 字节固定大小结构
- 304 字节固定大小结构
- n 字节字符串
std::basic_string<; char,std::char_traits< char >,MyAllocator <字符> > strn;
我需要能够为其中每一个总共分配大约 32MB 的空间。
分配器使用指向分配大小的指针的 std::map 来跟踪内存使用情况。 typedef std::maptypedef std::map< void*, size_t > SuperBlock;
每个 SuperBlock 代表 4MB 内存。
有一个 std::vector< SuperBlock >
其中一个 SuperBlock 空间不足。
分配器使用的算法如下:
- 对于每个超级块:超级块的末尾是否有空间?把分配放在那里。 (快速)
- 如果没有,则在每个超级块中搜索足够大小的空白空间并将分配放在那里。 (缓慢地)
- 还是什么都没有?分配另一个超级块并将分配放在新超级块的开头。
不幸的是,一段时间后,步骤 2 可能会变得非常慢。当创建对象副本并销毁临时变量时,我会得到很多碎片。这会导致内存结构内进行大量的深度搜索。由于我可以使用的内存量有限,因此存在碎片问题(请参阅下面的注释)
有人可以建议对此算法进行改进以加快该过程吗?我是否需要两种单独的算法(一种用于固定大小分配,一种用于字符串分配器)?
注意: 对于那些需要理由的人:我在 Windows Mobile 中使用此算法,其中堆的进程槽限制为 32MB。因此,通常的 std::allocator 不会削减它。我需要将分配放在 1GB 大内存区域中以获得足够的空间,这就是它的作用。
I have a Visual Studio 2008 C++ application where I'm using a custom allocator for standard containers such that their memory comes from a Memory Mapped File rather than the heap. This allocator is used for 4 different use cases:
- 104-byte fixed size structure
std::vector< SomeType, MyAllocator< SomeType > > foo;
- 200-byte fixed size structure
- 304-byte fixed size structure
- n-byte strings
std::basic_string< char, std::char_traits< char >, MyAllocator< char > > strn;
I need to be able to allocate roughly 32MB total for each of these.
The allocator tracks memory usage using a std::map
of pointers to allocation size. typedef std::map< void*, size_t > SuperBlock;
Each SuperBlock represents 4MB of memory.
There is a std::vector< SuperBlock >
of these in case one SuperBlock isn't enough space.
The algorithm used for the allocator goes like this:
- For each SuperBlock: Is there space at the end of the SuperBlock? put the allocation there. (fast)
- If not, search within each SuperBlock for an empty space of sufficient size and put the allocation there. (slow)
- Still nothing? allocate another SuperBlock and put the allocation at the start of the new SuperBlock.
Unfortunately, step 2 can become VERY slow after a while. As copies of objects are made and temporary variables destroyed I get a lot of fragmentation. This causes a lot of deep searching within the memory structure. Fragmentation is in issue as I have a limited amount of memory to work with (see note below)
Can anybody suggest improvements to this algorithm that would speed up the process? Do I need two separate algorithms (1 for the fixed-size allocations and one for the string allocator)?
Note: For those that need a reason: I'm using this algorithm in Windows Mobile where there's a 32MB process slot limit to the Heap. So, the usual std::allocator
won't cut it. I need to put the allocations in the 1GB Large Memory Area to have enough space and that's what this does.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以为要分配的每种不同的固定大小类型拥有单独的内存分配池吗?这样就不会有任何碎片,因为分配的对象将始终在 n 字节边界上对齐。当然,这对于可变长度字符串没有帮助。
Alexandrescu 的现代 C++ 设计< 中有一个小对象分配的示例/em> 说明了这一原则并可能会给您一些想法。
Can you have a separate memory allocation pool for each different fixed-size type you are allocating? That way there won't be any fragmentation, because the allocated objects will always align on n-byte boundaries. That doesn't help for the variable-length strings, of course.
There's an example of small-object allocation in Alexandrescu's Modern C++ design that illustrates this principle and may give you some ideas.
对于固定大小的对象,您可以创建固定大小的分配器。基本上,您分配块,分区为适当大小的子块,并使用结果创建一个链接列表。如果有可用内存(只需从列表中删除第一个元素并返回指向它的指针),那么从这样的块中进行分配的时间复杂度为 O(1),就像释放(将块添加到空闲列表中)一样。在分配过程中,如果列表为空,则获取一个新的超级块,分区并将所有块添加到列表中。
对于可变大小的列表,您可以通过仅分配已知大小的块来将其简化为固定大小的块:32 字节、64 字节、128 字节、512 字节。您必须分析内存使用情况以得出不同的存储桶,这样就不会浪费太多内存。对于大对象,您可以回到动态大小分配模式,这会很慢,但希望大对象的数量是有限的。
For the fixed sized objects, you can create a fixed sized allocator. Basically you allocate blocks, partition into subblocks of the appropriate size and create a linked list with the result. Allocating from such a block is O(1) if there is memory available (just remove the first element from the list and return a pointer to it) as is deallocation (add the block to the free list). During allocation, if the list is empty, grab a new superblock, partition and add all blocks into the list.
For the variable sized list, you can simplify it to the fixed size block by allocating only blocks of known sizes: 32 bytes, 64 bytes, 128 bytes, 512 bytes. You will have to analyze the memory usage to come up with the different buckets so that you don't waste too much memory. For large objects, you can go back to a dynamic size allocation pattern, that will be slow, but hopefully the amount of large objects is limited.
根据蒂姆的回答,我个人会使用类似于 BiBOP 的东西。
基本思想很简单:使用固定大小的池。
对此有一些改进。
首先,池的大小通常是固定的。这取决于您的分配例程,通常如果您知道您正在使用的操作系统在使用 malloc 时一次至少映射 4KB,那么您就使用该值。对于内存映射文件,您也许可以增加此值。
固定大小池的优点是它可以很好地防止碎片。所有页面的大小相同,您可以轻松地将空的 256 字节页面回收为 128 字节页面。
大对象仍然存在一些碎片,这些大对象通常分配在该系统之外。但它很低,特别是如果您将大对象放入页面大小的倍数中,这样内存将很容易回收。
二、如何处理水池?使用链接列表。
这些页面通常是无类型的(本身),因此您有一个可用的页面列表,可以在其中准备新页面并放置“回收的”页面。
对于每个大小类别,您都会有一个“占用”页面的列表,其中已分配内存。对于您保留的每个页面:
每个空闲单元本身就是一个指向下一个空闲单元的指针(或索引,取决于您拥有的大小)。
给定大小的“占用”页面列表的管理方式很简单:
这个方案真的是内存方面的高性能,仅保留一个页面用于索引。
对于多线程/多进程应用程序,您需要添加同步(通常每页一个互斥体),以防您可以从 Google 的 tcmalloc 中获得灵感(尝试找到另一个页面而不是阻塞,使用线程本地缓存以记住您上次使用的页面)。
话虽如此,您尝试过 Boost.Interprocess 吗?它提供分配器。
Building on Tim's answer, I would personally use something akin to BiBOP.
The basic idea is simple: use fixed-size pools.
There are some refinements to that.
First, the size of the pools is generally fixed. It depends on your allocation routine, typically if you know the OS you're working on map at least 4KB at once when you use malloc, then you use that value. For a memory mapped file, you might be able to increase this.
The advantage of fixed-size pools is that it nicely fights off fragmentation. All pages being of the same size, you can recycle an empty 256-bytes page into a 128-bytes page easily.
There is still some fragmentation for large objects, which are typically allocated outside of this system. But it's low, especially if you fit large objects into a multiple of the page size, this way the memory will be easy to recycle.
Second, how to handle the pools ? Using linked-lists.
The pages are typically untyped (by themselves), so you have a free-list of pages in which to prepare new pages and put "recycled" pages.
For each size category you then have a list of "occupied" pages, in which memory has been allocated. For each page you keep:
Each free-cell is itself a pointer (or index, depending on the size you have) to the next free-cell.
The list of "occupied" pages of a given size is simply managed:
This scheme is really performant memory-wise, with only a single page reserved for indexing.
For multi-threaded / multi-processes applications, you'll need to add synchronization (a mutex per page typically), in case you could get inspiration from Google's tcmalloc (try and find another page instead of blocking, use a thread-local cache to remember which page you last used).
Having said that, have you tried Boost.Interprocess ? It provides allocators.
对于固定大小,您可以轻松使用小型内存分配器类型的分配器,在其中分配一个大块,该块被分割成固定大小的块。然后,您创建一个指向可用块的指针向量,并在分配/释放时弹出/推送。这非常快。
对于可变长度的项目,这更困难:您要么必须搜索可用的连续空间,要么使用其他方法。您可能会考虑维护另一个按块大小排序的所有空闲节点的映射,这样您就可以 lower_bound 该映射,并且如果下一个可用节点只有 5% 太大,则返回它,而不是尝试找到确切大小的可用空间。
For the fixed sizes you can easily use a small memory allocator type of allocator where you allocate a large block that's split into fixed-size chunks. You then create a vector of pointers to available chunks and pop/push as you allocate/free. This is very fast.
For variable length items, it's harder: You either have to deal with searching for available contiguous space or use some other approach. You might consider maintaining another map of all the free nodes ordered by block size, so you can lower_bound the map and if the next available node is say only 5% too big return it instead of trying to find usable available space of the exact size.
我对可变大小项目的倾向是,如果可行的话,避免保存指向数据的直接指针,而是保留句柄。每个句柄都是超级块的索引,以及超级块内项目的索引。每个超级块都有一个自上而下分配的项目列表和自下而上分配的项目。每个项目的分配前面都会有其长度以及它所代表的项目的索引;使用索引的一位来指示某个项目是否“固定”。
如果某个项目适合最后分配的项目,则只需分配它即可。如果它会命中固定项目,请将下一个分配标记移过固定项目,找到下一个更高的固定项目,然后再次尝试分配。如果该项目会与项目列表发生冲突,但某处有足够的可用空间,请压缩该块的内容(如果固定了一个或多个项目,则最好使用另一个可用的超级块)。根据使用模式,可能需要从仅压缩自上次集合以来添加的内容开始;如果这不能提供足够的空间,那么就压缩一切。
当然,如果只有几个离散大小的项目,您可以使用简单的固定大小的块分配器。
My inclination for variable-sized items would be to, if practical, avoid holding direct pointers to data and instead keep handles. Each handle would be a the index of a superblock, and an index to an item within the superblock. Each superblock would have an item-list allocated top-down and items allocated bottom-up. Each item's allocation would be preceded by its length, and by the index of the item it represents; use one bit of the index to indicate whether an item is 'pinned'.
If an item fits after the last allocated item, simply allocate it. If it would hit a pinned item, move the next-allocation mark past the pinned item, find the next higher pinned item, and try the allocation again. If the item would collide with the item-list but there's enough free space somewhere, compactify the block's contents (if one or more items are pinned, it may be better to use another superblock if one is available). Depending upon usage patterns, it may be desirable to start by only compactifying the stuff that was added since the last collection; if that doesn't provide enough space, then compactify everything.
Of course, if only have a few discrete sizes of items, you can use simple fixed-sized-chunk allocators.
我同意蒂姆的观点——使用内存池来避免碎片。
但是,您可以通过在向量中存储指针而不是对象来避免一些混乱,也许 ptr_vector?
I agree with Tim - use memory pools to avoid fragmentation.
However you may be able to avoid some of the churn by storing pointers rather than objects in your vectors, perhaps ptr_vector?