如何实现可变大小的缓存对象以减少 C++ 中的内存分配?
在表演之前,人们让我大吃一惊:是的,在问这个问题之前我已经完成了分析:)
我再次查看我的 一个类型容器,虽然我有一个可行的解决方案,但性能很差,因为每种类型缓存的项目会导致在堆上进行单独分配(这当然很昂贵)。
基于对程序输入的静态分析,我找到了一种方法来了解可能放入正在传递的缓存对象中的所有对象所需的总大小。基本上,我有一个可以在给定缓存对象中构造的对象列表,因此我可以提前知道我可能需要缓存的对象的大小,但不是在编译时——仅在运行时。
基本上,我想做的就是 boost::make_shared
所做的——获取单个内存块,并构造 shared_ptr
位以及受控对象同一个内存块。
我不必担心保留复制行为,因为缓存对象是不可复制的,并且由客户端通过指针传递(它通常存储在 ptr_vector
或 std::auto_ptr< /代码>)。
然而,我不熟悉如何实现这样一个容器,即如何遵循对齐限制等。
在伪代码中,我想做的是:
//I know a lot of what's in here is not portable -- I need to run only on x86
//and x64 machines. Yes, this couple of classes looks hacky, but I'd rather
//have one hacky class than a whole programfull :)
class CacheRegistrar
{
//Blah blah
public:
//Figures out what objects will be in the cache, etc
const std::vector<std::size_t>& GetRequiredObjectSizes() const;
//Other stuff...
template <typename T>
void RegisterCacheObject();
template <typename T>
std::size_t GetObjectIndex() const;
// etc.
};
class CacheObject;
std::auto_ptr<CacheObject> CacheObjectFactory(const CacheRegistrar& registrar)
{
//Pretend this is in a CPP file and therefore CacheObject is defined...
const std::vector<size_t>& sizes(registrar.GetRequiredObjectSizes());
std::size_t sumOfCache = std::accumulate(sizes.begin(), sizes.end());
sumOfCache += sizeof(CacheObject);
boost::scoped_array<char> buffer(new char[] sumOfCache);
CacheObject *obj = new (reinterpret_cast<void *>(buffer.get())) CacheObject;
buffer.release(); //PSEUDOCODE (boost::scoped_array has no release member);
return std::auto_ptr<CacheObject>(obj); //Nothrow
}
class CacheObject
{
CacheRegistrar *registrar; //Set by my constructor
public:
template<typename T>
T& Get()
{
char * startOfCache = reinterpret_cast<char *>(this) +
sizeof(CacheObject);
char * cacheItem = startOfCache + registrar->GetObjectIndex<T>();
return *reinterpret_cast<T*>(cacheItem);
}
};
我的一般概念在这里合理吗?有更好的方法来实现这一点吗?
Before the performance people tear my head off: yes, I have done profiling before asking this :)
I'm once again looking at my one of a type container, and while I have a solution that works, the performance is poor because each type of item that's cached results in a separate allocation on the heap (which is of course expensive).
Based on static analysis of my program's input, I have figured out a way to know the total size required by all of the objects that might be put in my cache object that's getting passed around. Basically, I have a list of objects that may be constructed in a given cache object, so I know what the size of what I might have to cache is in advance, but not at compile time -- runtime only.
Basically, what I'd like to do is what boost::make_shared
does -- gets a single memory block, and constructs the shared_ptr
bits as well as the controlled object in the same memory block.
I don't have to worry about preserving copying behavior as the cache object is noncopyable and passed around by pointer by clients (it's usually stored in something like a ptr_vector
or a std::auto_ptr
).
I'm not familiar however with how exactly one would implement such a container however, namely how one follows alignment restrictions and such.
In pseudocode, what I'd like to do:
//I know a lot of what's in here is not portable -- I need to run only on x86
//and x64 machines. Yes, this couple of classes looks hacky, but I'd rather
//have one hacky class than a whole programfull :)
class CacheRegistrar
{
//Blah blah
public:
//Figures out what objects will be in the cache, etc
const std::vector<std::size_t>& GetRequiredObjectSizes() const;
//Other stuff...
template <typename T>
void RegisterCacheObject();
template <typename T>
std::size_t GetObjectIndex() const;
// etc.
};
class CacheObject;
std::auto_ptr<CacheObject> CacheObjectFactory(const CacheRegistrar& registrar)
{
//Pretend this is in a CPP file and therefore CacheObject is defined...
const std::vector<size_t>& sizes(registrar.GetRequiredObjectSizes());
std::size_t sumOfCache = std::accumulate(sizes.begin(), sizes.end());
sumOfCache += sizeof(CacheObject);
boost::scoped_array<char> buffer(new char[] sumOfCache);
CacheObject *obj = new (reinterpret_cast<void *>(buffer.get())) CacheObject;
buffer.release(); //PSEUDOCODE (boost::scoped_array has no release member);
return std::auto_ptr<CacheObject>(obj); //Nothrow
}
class CacheObject
{
CacheRegistrar *registrar; //Set by my constructor
public:
template<typename T>
T& Get()
{
char * startOfCache = reinterpret_cast<char *>(this) +
sizeof(CacheObject);
char * cacheItem = startOfCache + registrar->GetObjectIndex<T>();
return *reinterpret_cast<T*>(cacheItem);
}
};
Is my general concept sound here? Is there are better way of accomplishing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
但首先,请阅读Andrei Alexandrescu 的这篇文章,了解他认为自己应该写的内容在那一章中——一种使用堆层构建堆的方法(真正由你自己构建)。我使用堆层来构建Hoard,DieHard 和 DieHarder ,以及我们的 OOPLSA 2002 论文中使用的自定义分配器,重新考虑自定义内存分配,在开始创建自定义分配器之前您还应该阅读该内容。
But first, read this article by Andrei Alexandrescu on what he thinks he should have written in that chapter -- a way to build heaps using Heap Layers (by yours truly). I used Heap Layers to build Hoard, DieHard, and DieHarder, as well as the custom allocators used in our OOPLSA 2002 paper, Reconsidering Custom Memory Allocation, which you should also read before embarking on creating a custom allocator.
查看 Loki 小对象分配器。
快速谷歌搜索没有产生任何直接的以人为本的文档。有 DOxygen 生成的文档,但不是特别好理解。然而,设计和实现记录在 Andrei Alexandrescu 的“Modern C++ Design”中。
如果您只想有效回收给定类的对象,请考虑一个简单的空闲列表 - 可能是原始存储块的空闲列表。
干杯&呵呵,
Check out the Loki small objects allocator.
Quick-googling that didn’t yield any direct human-oriented docs. There is DOxygen-generated documentation but not especially grokkable. However, the design and implementation is documented in Andrei Alexandrescu’s "Modern C++ Design".
If you just want efficient recycling for objects of a given class, then consider a simple free-list – possibly a free-list of raw storage chunks.
Cheers & hth.,
我看到的关键问题是
返回auto_ptr
为以非默认方式分配的内存 。您可以通过定义合适的重载删除来解决此问题,但最好将自己的销毁函数定义为工厂的一部分。如果这样做,您还可以将内存管理本地化到 Cache 类中,从而使您可以更自由地提高该类的本地性能。
当然,使用智能指针来控制内存管理是一个好主意;您需要做的是定义自己的分配器并定义一个 smart_ptr 来使用它。
作为参考,管理自定义分配的另一种方法是定义自定义 new 运算符。即这种事情:
The key issue I see is returning an
auto_ptr
for memory allocated in a non-default fashion. You could solve this by defining a suitable overloaded delete, but its better to define your own destroy function as part of the factory. If you do this, you also localise the memory management in the Cache class, allowing you more freedom to improve performance local to that class.
If course, using a smart pointer to control memory management is a good idea; what you would need to do is define your own allocator and define a smart_ptr to use it.
For reference, another approach to managing custom allocation is to define a custom new operator. I.e. This sort of thing: