std::vector、std::map 和内存问题

发布于 2024-08-22 22:42:15 字数 926 浏览 3 评论 0原文

我正在编写将数据库中的行插入向量的代码。然后向量存储在 std::map 中。这种架构允许我根据映射键对数据集(向量)进行逻辑分区。

在我的代码中,我将从 std::map 检索数据集(即向量),向其中添加/删除行或对其执行一些其他逻辑,然后将向量粘回地图中(所有这些都在while() 循环)。

我有两个问题,两个问题都与向量中存储的(可能)大量元素有关。该向量可以容纳从几十到几万个元素的任何元素。我无法事先知道将从数据库中检索多少条记录。因此,为了有效地使用内存并避免不必要的(内存)碎片,std::vector 的内存管理方案(即 alloc/dealloc)变得非常重要:

我的问题是:

  1. 考虑到潜在的大量内存行可以存储的元素,理想情况下我想从内存池分配/释放。但我不知道如何将 std::vector 与内存池一起使用,而且我不知道这是否会变得不必要的复杂。如果这太过分了(或者太复杂),那么我的另一个想法是在首次创建向量时预先分配固定的内存块。但这也可能过于简单化,因为一个向量实例与另一个向量实例的元素数量可能相差很大,导致(内存)碎片等,更不用说内存使用效率低下了。

    这里推荐的方法是什么?

  2. 考虑到 std::map(所有 STL 容器 IIRC)存储值元素的副本,因此可以制作包含数十个向量的副本数千个元素,这是完全错误的。因此,我正在考虑围绕 stl::map 编写一个 SmartPointerMap 包装器,然后存储指向向量而不是实际向量的指针。

    我走在正确的道路上吗?如果否,什么是更好的解决方案?如果是,是否有一个我可以使用的 boost 库(而不是编写我的 SmartPointerMap 类模板)?

[编辑]

我正在使用 gcc 4.4.1 在 Linux (Ubuntu 9.10) 上构建

I am writing code that inserts rows from a database into a vector. The vectors are then stored in a std::map. This architecture allows me to logically partition the datasets (vectors), based on the map key.

In my code, I will be retrieving a dataset (i.e. vector) from the std::map, adding/removing rows to it or performing some other logic on it, and then sticking the vector back into the map (all this goes on in a while() loop).

I have two questions, both of which are related to the (potentially) large number of elements stored in a vector. The vector may hold anything from a few tens to several tens of thousands of elements. There is no way of me to know beforehand, how many records will be retrieved from the database. Therefore, the memory management scheme of std::vector (i.e. alloc/dealloc) becomes of great importance in order to use memory effeciently, and to avoid unecessary (memory) fragmentation:

My questions are:

  1. Given the potentially large number of elements that a row can store, ideally I would like to alloc/dealloc from a memory pool. But I dont know how to use std::vector with a memory pool, and I dont know if that is going to be unnecessarily complicated. If that is overkill (or too complicated), then my other idea is to preallocate a fixed slab of memory when the vector is first created. But this is also likely to be too simplistic, as the number of elemnts are likely to vary widely from one vector instance to another - leading to (memory) fragmentation etc, not to mention inefficient use of memory.

    What is the recommended approach here?

  2. Given the fact that std::map (all STL containers IIRC), stores a COPY of the value element, the prospect of making copies of vectors containing several tens of thousands of elements, is just plain wrong. I am thinking therefore, of writing a SmartPointerMap wrapper arround stl::map and then store pointers to the vectors instead of the actual vectors.

    Am I on the right track?. If No, what is a better solution.?. If yes, is there a boost library I can use (instead of writing my SmartPointerMap class template)?

[Edit]

I am building on Linux (Ubuntu 9.10) with gcc 4.4.1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

瘫痪情歌 2024-08-29 22:42:16

我将建议一种更通用的方法来使用 C/C++ 中的数据库查询集。

执行数据库查询并将行放入结构数组中。如果没有足够的物理内存,请使用内存映射文件,该文件将返回一个指向结构数组基址的指针,处理器芯片上的 MMU 将在需要时将其驻留在内存中。

现在,您可以在一个键上对这个结构数组进行排序,这将为您提供 ISAM 访问权限,这在关系宗教中是异端邪说,但正是游标为您提供的。这是数据访问方法 #1

现在,如果您仍然需要以另一种方式查看该数据,只需从数组中创建 1 个或多个索引/映射,其中映射的键是适当的(列值/结构成员)结构体,映射的数据值是指向该结构体在内存映射文件中的位置的指针。这提供了与数据库索引完全相同的功能。通过任何给定行的 struct.member 值对映射进行下标将使您立即返回到该行。

您可以有多个映射,用于多个索引,指向数组中的相同结构/行,对不同索引的映射键使用不同的 struct.member 数据。

我忽略了您声明的添加新行/结构的要求,因为您似乎正在从数据库中读取行,并且实际上不需要添加新行/结构。

但是,如果我错了,并且您确实需要添加新的结构/行,那么不必分配一大堆结构,只需使用 malloc() 和 realloc() 一个适当大小的结构数组并将它们挂在指针上那是地图的“数据”。通过这种方式,您将失去真正的 ISAM 访问权限,因为这些结构在内存中物理上不是相邻存储的,但您获得了 realloc() 功能。这是一个权衡。 ISAM 访问可以通过迭代映射来模拟,但它会比从内存中顺序读取结构体数组慢。

I'm going to suggest a more general approach to working with database query sets in C/C++.

Do the db query and put the rows in an array of structures. If you don't have enough physical memory, use a memory mapped file, which will return a pointer to the base of an array of structures which the MMU onboard the processor chip will worry about making resident in memory when needed.

You can now sort() this array of structs on a key/s, which will give you ISAM access, heresy in Relational Religion, but exactly what cursors give you. This is data access method #1

Now, if you still need to look at that data in another way/s, just create 1 or more indices/maps where the map's keys are the appropriate (column value/struct member) from the array of structs, and the data value of the map is a pointer to that struct's location in the memory mapped file. This provides exactly the same functionality as a db index. Subscripting the map by the struct.member value for any given row will get you right back to that row.

You can have multiple maps, for multiple indices, pointing to the same structs/rows in the array, using different struct.member data for the map's keys for different indices.

I'm ignoring your stated requirement to add new rows/structs, as it seems you are doing a single read of rows from the db, and don't actually need to add new rows/structs.

If, however, I'm wrong, and you do need to add new structs/rows, then instead of allocating one big array of structs, just malloc() and realloc() an appropriate sized array of structs and hang them off the pointers that are the map's "data". You lose true ISAM access this way, because the structs are not stored physically adjacent in memory, but you gain realloc() capability. It's a trade-off. ISAM access can be simulated by iterating over the map, but it will be slower than reading an array of structs sequentially from memory.

君勿笑 2024-08-29 22:42:15

假设您使用 vector 作为地图的 data_type 而不是 key_type,您可以就地修改数据而无需复制数据。 std::map::operator[]() 返回对非常量 data_type 的引用,以及从 std 的非常量版本返回的迭代器::map::find() 允许您修改数据。

如果更改数据时需要更改密钥怎么办?您可以使用 std::swap() 将数据从一个向量移动到另一个向量,而无需复制。

不要忘记,当您删除元素时,vector 不会减少其capacity()。此外,vector 通常会分配比您需要的更多的 capacity(),因此 push_back() 需要摊销常量时间。对于非常大的向量,如果您不小心,这些行为可能会显着增加程序的内存使用量。

如果您使用 vector 作为地图的 key_type 并且地图具有非常大的键,那么指针或智能指针可能会有所帮助。但是,如果是这种情况,您必须确保不要修改映射值之一所指向的键的内容,因为 std::map 并非旨在处理这种情况。

至于自定义分配器的想法,首先让您的代码与标准分配器一起工作,然后看看它是否足够快。使用标准分配器可能没问题。如果您的代码使用标准分配器不够快,请分析以查看时间真正花在哪里(可能在其他地方,例如数据库代码)。如果您编写自定义分配器并且从未将其与标准分配器进行比较,您将不知道您的自定义分配器是否实际上是一种改进;它可能比标准分配器慢得多。

Assuming that you're using vector for the map's data_type and not the key_type, you could modify the data in place without copying it. std::map::operator[]() returns a reference to a non-const data_type, and the iterator returned from the non-const version of std::map::find() allows you to modify the data.

What if you need to change the key when you change the data? You can use std::swap() to move the data from one vector to another without copying.

Don't forget that vector does not reduce its capacity() when you erase elements. Also, vector will usually allocate more capacity() than you need, so that push_back() takes amortized constant time. For very large vectors, these behaviors may significantly increase your program's memory usage, if you're not careful.

If you're using vector for the map's key_type and the map has extremely large keys, then pointers or smart pointers might help. However, if this is the case, you must make sure not to modify the contents of a key that is pointed to by one of the map values, because std::map is not designed to handle that.

As for the custom allocator idea, get your code working with the standard allocator first, and then see if it's fast enough. It might be fine using the standard allocator. If your code is not fast enough with the standard allocator, profile to see where the time is really being spent (it may be somewhere else, like the database code). If you write a custom allocator and you never compare it against the standard allocator, you will not know whether your custom allocator is actually an improvement; it could be much slower than the standard allocator.

小情绪 2024-08-29 22:42:15

关于 #1,GCC/Linux (ptmalloc) 中的默认堆实现将为小对象使用空闲列表(又名内存池)(我上次检查时默认情况下 <= 64 字节)。如果您仍然想使用自定义分配器,Boost.Pool 库具有 满足标准分配器要求的分配器。正如 bk1e 所建议的,在前后进行基准测试,看看是否值得。

从数据库填充向量时,如果可能/实用,请尝试使用 vector::reserve() 使向量从一开始就分配足够的内存并避免重新分配。

希望这有帮助。

Wrt #1, the default heap implementation in GCC/Linux (ptmalloc) will use a free list (aka memory pool) for small objects (<=64 bytes by default last time I checked). If you still want to use a custom allocator, the Boost.Pool library has allocators that satisfy the Standard Allocator requirements. As bk1e suggested, benchmark before and after to see if it's even worth it.

When populating your vectors from the database, if possible/practical, try to use vector<T>::reserve() to make vector allocate enough memory from the start and avoid reallocations.

Hope this helps.

温柔女人霸气范 2024-08-29 22:42:15

回答问题 2:

using namespace std;

map<string, vector<int>* > foo;
vector<int>* pointer= foo["thekey"];

如果需要使用智能(引用计数)指针:

#include<tr1/shared_ptr.h>
using namespace std::tr1;
using namespace std;

map<string, shared_ptr<vector<int> > > foo;
shared_ptr<vector<int> > pointer= foo["thekey"];

要回答问题 1,您可以编写一个新的分配器模板类并声明您的向量以使用该分配器,但我对此一无所知编写分配器:

map<string, vector<int, myallocator<int> >* > foo;

特别是我不知道如何设计一个分配器来避免内存池碎片。 (但是如果您已经回答了这部分问题,那么编写自定义分配器将是正确的选择。)

To answer question 2:

using namespace std;

map<string, vector<int>* > foo;
vector<int>* pointer= foo["thekey"];

If using smart (reference-counted) pointers is a requirement:

#include<tr1/shared_ptr.h>
using namespace std::tr1;
using namespace std;

map<string, shared_ptr<vector<int> > > foo;
shared_ptr<vector<int> > pointer= foo["thekey"];

To answer question #1, you can write a new allocator template class and declare your vectors to use that allocator, but I don't really know anything about writing allocators:

map<string, vector<int, myallocator<int> >* > foo;

In particular I don't know how to design an allocator that will avoid fragmenting your memory pool. (But if you have that part answered, then writing a custom allocator would be the way to go.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文