C++利用底层内存池的自定义分配器
我正在使用一个内存池类,它重用分配的内存地址和一个包装的自定义分配器 那个班。以下代码片段为您提供了该界面的基本概念。
template<class alloc>
class memory_pool
: boost::noncopyable,
public allocator_traits<void>
{
public:
memory_pool(typename alloc::size_type alloc_size);
memory_pool(typename alloc::size_type alloc_size, alloc const&);
template<typename U> memory_pool(typename alloc::size_type alloc_size,
typename alloc::rebind<U>::other const&);
virtual ~memory_pool();
pointer allocate (); /*throw(std::bad_alloc)*/
void collect ();
void deallocate(pointer) throw(); /*noexcept*/
};
pointer allocate()
{/*
Checks if a suitable chunk of memory is available in a internal linked list.
If true, then the chunk is returned and the next chunk moves up.
Otherwise, new memory is allocated by the underlying allocator.
*/}
void deallocate(pointer)
{/*
Interprets the passed pointer as a chunk of memory and stores it in a linked list.
Please note that memory isn't actually deallocated.
*/}
void collect()
{/*
Effectively deallocates the cunks in the linked list.
This will be called at least once during destruction.
*/}
当然,对这样的事情的需求是有限的。但是,在您需要的情况下它非常有用 到: - 为一个类指定一个分配器类型,该类以非常幼稚的方式使用该分配器(例如避免 即使建议分配较大的部分)。 - 重复分配和释放相同大小的内存。 - 您希望使用分配器的类型非常小(例如,内置类型,如 char、short、int 等)。
理论上,实现可以利用内存池,每次需要时(从底层内存管理器)分配实际分配大小的倍数。靠近的对象更适合任何缓存和/或预取算法。 我已经实现了这样一个内存池,有一些开销来处理正确的分配、分割和释放(我们无法释放用户将传递给释放的每个地址。我们只需要释放作为我们拥有的每个内存块的开头的地址)之前已分配)。
我使用以下非常简单的代码测试了这两种情况:
std::list<int, allocator<int>> list;
std::clock_t t = std::clock();
for (int i = 0; i < 1 << 16; ++i)
{
for (int j = 0; j < 1 << 16; ++j)
list.push_back(j);
list.unique();
for (int j = 0; j < 1 << 16; ++j)
list.pop_back();
}
std::cout << (std::clock() - t) / CLOCKS_PER_SEC << std::endl;
每次 push_back
std::list 都会调用 allocactor::allocate(1, 0)
代码 > 被调用。 unique()
确保每个元素都会被触摸并与下一个元素进行比较。 然而,结果却令人失望。管理按块分配内存池所需的最小开销大于系统获得的任何可能的优势。
您能想到一个可以提高性能的场景吗?
编辑: 当然,它比 std::allocator 快得多。
I'm using a memory pool class which reuses allocated memory addresses and a custom allocator which wraps
that class. The following code snippet gives you a basic idea of the interface.
template<class alloc>
class memory_pool
: boost::noncopyable,
public allocator_traits<void>
{
public:
memory_pool(typename alloc::size_type alloc_size);
memory_pool(typename alloc::size_type alloc_size, alloc const&);
template<typename U> memory_pool(typename alloc::size_type alloc_size,
typename alloc::rebind<U>::other const&);
virtual ~memory_pool();
pointer allocate (); /*throw(std::bad_alloc)*/
void collect ();
void deallocate(pointer) throw(); /*noexcept*/
};
pointer allocate()
{/*
Checks if a suitable chunk of memory is available in a internal linked list.
If true, then the chunk is returned and the next chunk moves up.
Otherwise, new memory is allocated by the underlying allocator.
*/}
void deallocate(pointer)
{/*
Interprets the passed pointer as a chunk of memory and stores it in a linked list.
Please note that memory isn't actually deallocated.
*/}
void collect()
{/*
Effectively deallocates the cunks in the linked list.
This will be called at least once during destruction.
*/}
Sure, the need for something like this is limitated. However, it's very useful in situations where you need
to:
- Specifiy a allocator type for a class which uses that allocator in a very naive way (e.g. Avoids
allocation of larger pieces even if it would be advisable).
- Allocate and deallocate repeatedly the same size of memory.
- The type you wish to use the allocator for is of very small size (e.g. built-in types like char, short, int etc.).
Theoretically, an implementation could take advantage of a memory_pool which allocates a multiple of the actual allocation size, each time it needs to do it (from the underlying memory manager). Objects that are close together are more suitable for any cache and / or prefetching algorithm.
I've implemented such a memory pool with some overhead to handle the correct allocation, splitting and deallocation (We cannot deallocate each address the user will pass to deallocate. We need to deallocate only that addresses which are the beginning of each memory block we have been previously allocated).
I've tested both cases with the following really simple code:
std::list<int, allocator<int>> list;
std::clock_t t = std::clock();
for (int i = 0; i < 1 << 16; ++i)
{
for (int j = 0; j < 1 << 16; ++j)
list.push_back(j);
list.unique();
for (int j = 0; j < 1 << 16; ++j)
list.pop_back();
}
std::cout << (std::clock() - t) / CLOCKS_PER_SEC << std::endl;
std::list
is calling allocactor::allocate(1, 0)
each time push_back
is called. unique()
makes sure that each element will be touched and compared to the next element.
However, the result was disappointing. The minimal overhead which is needed to manage the blockwise allocating memory pool is greater than any possible advantage the system gets.
Can you think of a scenario where it will improve performance?
EDIT:
Of course, it's much faster than std::allocator
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
C++0x 对作用域分配器 例如内存池。
分析您的代码。除非您的算法执行非常规则的分配/释放模式(例如 LIFO),否则很难预测这将带来什么优势。
当所有分配的对象大小相同时,编写一个非常快的分配器非常容易。一旦我写了一些类似于
在设计分配器之前您需要确定将分配什么以及如何分配的内容。
operator new
的答案是“anything”和“anyhow”,这就是为什么它有时不是最佳的。如果您不能正确回答这些问题,您的分配器可能不会有很大的改进。C++0x has better support for scoped allocators such as memory pools.
Profile your code., it's very hard to predict what advantages this will confer unless your algorithm performs very regular allocation/deallocation patterns such as LIFO.
It's dead easy to write a very fast allocator when all the allocated objects are the same size. Once I wrote something along the lines of
Before you can design an allocator you need to be sure what will be allocated and how. The answers for
operator new
are "anything" and "anyhow", which is why it's sometimes sub-optimal. If you can't answer these questions properly, your allocator probably won't be a big improvement.进行大量(每秒 10k+)分配和释放的操作将受益于
不必每次分配/释放时都遇到复杂的查询,但是,只有当
延迟分配/释放到组中的综合节省超过了处理一个组所需的费用(基本上,您需要用每单位节省来分摊组开销)。
连续内存的使用将有助于任何基于节点/指针的结构,如树(但仅限于一个点)。然而,现实世界的好处可能截然不同(或根本不存在!)
从他们的计划来看,这就是为什么在创建这样的自定义系统时,您应该分析您的代码并已经了解它将如何使用(即:没有必要为一些分配很少的东西制作一个自定义池分配器,以至于速度增益根本不重要)。
然而,这样的东西对于调试来说很方便,因为你有一个很好的接口来标记内存以监视泄漏和覆盖/无效写入,所以即使它具有与标准系统相同的性能,你也可以通过其他方式获得收益。
Something that does a lot (10k+ per sec) of allocations and deallocations would benefit from
not having to run into complex queries every time to allocs/frees, however, its only if the
combined saving on delaying the allocs/frees into groups is more than it takes to handle a group (basically you need to amortize the group over head with the per-unit saving).
the use of contiguous memory would help any node/pointer based structures like trees (but only to a point). however, the real world benefits can be drastically different (or non-existent!)
from what they where planned to be, which is why when walking down the road of creating custom systems like this, you should be profiling your code and already have a picture in mind of how it will be used (ie: there is no point in making a custom pool allocator for something that does so few allocations that the speed gain doesn't matter at all).
Something like this can be handy for debugging however, as you have a nice interface for tagging memory for monitoring leaks and overwrites/invalid writes, so even if it has the same performance as the standard system, you can gain in other ways.