std::vector、std::map 和内存问题

发布于 2024-08-22 22:42:15 字数 926 浏览 3 评论 0原文

我正在编写将数据库中的行插入向量的代码。然后向量存储在 std::map 中。这种架构允许我根据映射键对数据集（向量）进行逻辑分区。

在我的代码中，我将从 std::map 检索数据集（即向量），向其中添加/删除行或对其执行一些其他逻辑，然后将向量粘回地图中（所有这些都在while() 循环）。

我有两个问题，两个问题都与向量中存储的（可能）大量元素有关。该向量可以容纳从几十到几万个元素的任何元素。我无法事先知道将从数据库中检索多少条记录。因此，为了有效地使用内存并避免不必要的（内存）碎片，std::vector 的内存管理方案（即 alloc/dealloc）变得非常重要：

我的问题是：

考虑到潜在的大量内存行可以存储的元素，理想情况下我想从内存池分配/释放。但我不知道如何将 std::vector 与内存池一起使用，而且我不知道这是否会变得不必要的复杂。如果这太过分了（或者太复杂），那么我的另一个想法是在首次创建向量时预先分配固定的内存块。但这也可能过于简单化，因为一个向量实例与另一个向量实例的元素数量可能相差很大，导致（内存）碎片等，更不用说内存使用效率低下了。
这里推荐的方法是什么？
考虑到 std::map（所有 STL 容器 IIRC）存储值元素的副本，因此可以制作包含数十个向量的副本数千个元素，这是完全错误的。因此，我正在考虑围绕 stl::map 编写一个 SmartPointerMap 包装器，然后存储指向向量而不是实际向量的指针。
我走在正确的道路上吗？如果否，什么是更好的解决方案？如果是，是否有一个我可以使用的 boost 库（而不是编写我的 SmartPointerMap 类模板）？

[编辑]

我正在使用 gcc 4.4.1 在 Linux (Ubuntu 9.10) 上构建

原文

I am writing code that inserts rows from a database into a vector. The vectors are then stored in a std::map. This architecture allows me to logically partition the datasets (vectors), based on the map key.

In my code, I will be retrieving a dataset (i.e. vector) from the std::map, adding/removing rows to it or performing some other logic on it, and then sticking the vector back into the map (all this goes on in a while() loop).

I have two questions, both of which are related to the (potentially) large number of elements stored in a vector. The vector may hold anything from a few tens to several tens of thousands of elements. There is no way of me to know beforehand, how many records will be retrieved from the database. Therefore, the memory management scheme of std::vector (i.e. alloc/dealloc) becomes of great importance in order to use memory effeciently, and to avoid unecessary (memory) fragmentation:

My questions are:

Given the potentially large number of elements that a row can store, ideally I would like to alloc/dealloc from a memory pool. But I dont know how to use std::vector with a memory pool, and I dont know if that is going to be unnecessarily complicated. If that is overkill (or too complicated), then my other idea is to preallocate a fixed slab of memory when the vector is first created. But this is also likely to be too simplistic, as the number of elemnts are likely to vary widely from one vector instance to another - leading to (memory) fragmentation etc, not to mention inefficient use of memory.
What is the recommended approach here?
Given the fact that std::map (all STL containers IIRC), stores a COPY of the value element, the prospect of making copies of vectors containing several tens of thousands of elements, is just plain wrong. I am thinking therefore, of writing a SmartPointerMap wrapper arround stl::map and then store pointers to the vectors instead of the actual vectors.
Am I on the right track?. If No, what is a better solution.?. If yes, is there a boost library I can use (instead of writing my SmartPointerMap class template)?

[Edit]

I am building on Linux (Ubuntu 9.10) with gcc 4.4.1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瘫痪情歌 2024-08-29 22:42:16

我将建议一种更通用的方法来使用 C/C++ 中的数据库查询集。

执行数据库查询并将行放入结构数组中。如果没有足够的物理内存，请使用内存映射文件，该文件将返回一个指向结构数组基址的指针，处理器芯片上的 MMU 将在需要时将其驻留在内存中。

现在，您可以在一个键上对这个结构数组进行排序，这将为您提供 ISAM 访问权限，这在关系宗教中是异端邪说，但正是游标为您提供的。这是数据访问方法 #1

现在，如果您仍然需要以另一种方式查看该数据，只需从数组中创建 1 个或多个索引/映射，其中映射的键是适当的（列值/结构成员）结构体，映射的数据值是指向该结构体在内存映射文件中的位置的指针。这提供了与数据库索引完全相同的功能。通过任何给定行的 struct.member 值对映射进行下标将使您立即返回到该行。

您可以有多个映射，用于多个索引，指向数组中的相同结构/行，对不同索引的映射键使用不同的 struct.member 数据。

我忽略了您声明的添加新行/结构的要求，因为您似乎正在从数据库中读取行，并且实际上不需要添加新行/结构。

但是，如果我错了，并且您确实需要添加新的结构/行，那么不必分配一大堆结构，只需使用 malloc() 和 realloc() 一个适当大小的结构数组并将它们挂在指针上那是地图的“数据”。通过这种方式，您将失去真正的 ISAM 访问权限，因为这些结构在内存中物理上不是相邻存储的，但您获得了 realloc() 功能。这是一个权衡。 ISAM 访问可以通过迭代映射来模拟，但它会比从内存中顺序读取结构体数组慢。

回复收藏 0 原文

君勿笑 2024-08-29 22:42:15

假设您使用 vector 作为地图的 data_type 而不是 key_type，您可以就地修改数据而无需复制数据。 std::map::operator[]() 返回对非常量 data_type 的引用，以及从 std 的非常量版本返回的迭代器::map::find() 允许您修改数据。

如果更改数据时需要更改密钥怎么办？您可以使用 std::swap() 将数据从一个向量移动到另一个向量，而无需复制。

不要忘记，当您删除元素时，vector 不会减少其capacity()。此外，vector 通常会分配比您需要的更多的 capacity()，因此 push_back() 需要摊销常量时间。对于非常大的向量，如果您不小心，这些行为可能会显着增加程序的内存使用量。

如果您使用 vector 作为地图的 key_type 并且地图具有非常大的键，那么指针或智能指针可能会有所帮助。但是，如果是这种情况，您必须确保不要修改映射值之一所指向的键的内容，因为 std::map 并非旨在处理这种情况。

至于自定义分配器的想法，首先让您的代码与标准分配器一起工作，然后看看它是否足够快。使用标准分配器可能没问题。如果您的代码使用标准分配器不够快，请分析以查看时间真正花在哪里（可能在其他地方，例如数据库代码）。如果您编写自定义分配器并且从未将其与标准分配器进行比较，您将不知道您的自定义分配器是否实际上是一种改进；它可能比标准分配器慢得多。

回复收藏 0 原文

小情绪 2024-08-29 22:42:15

关于 #1，GCC/Linux (ptmalloc) 中的默认堆实现将为小对象使用空闲列表（又名内存池）（我上次检查时默认情况下 <= 64 字节）。如果您仍然想使用自定义分配器，Boost.Pool 库具有满足标准分配器要求的分配器。正如 bk1e 所建议的，在前后进行基准测试，看看是否值得。

从数据库填充向量时，如果可能/实用，请尝试使用 vector::reserve() 使向量从一开始就分配足够的内存并避免重新分配。

希望这有帮助。

回复收藏 0 原文

温柔女人霸气范 2024-08-29 22:42:15

回答问题 2：

using namespace std;

map<string, vector<int>* > foo;
vector<int>* pointer= foo["thekey"];

如果需要使用智能（引用计数）指针：

#include<tr1/shared_ptr.h>
using namespace std::tr1;
using namespace std;

map<string, shared_ptr<vector<int> > > foo;
shared_ptr<vector<int> > pointer= foo["thekey"];

要回答问题 1，您可以编写一个新的分配器模板类并声明您的向量以使用该分配器，但我对此一无所知编写分配器：

map<string, vector<int, myallocator<int> >* > foo;

特别是我不知道如何设计一个分配器来避免内存池碎片。（但是如果您已经回答了这部分问题，那么编写自定义分配器将是正确的选择。）

To answer question 2:

using namespace std;

map<string, vector<int>* > foo;
vector<int>* pointer= foo["thekey"];

If using smart (reference-counted) pointers is a requirement:

#include<tr1/shared_ptr.h>
using namespace std::tr1;
using namespace std;

map<string, shared_ptr<vector<int> > > foo;
shared_ptr<vector<int> > pointer= foo["thekey"];

To answer question #1, you can write a new allocator template class and declare your vectors to use that allocator, but I don't really know anything about writing allocators:

map<string, vector<int, myallocator<int> >* > foo;

In particular I don't know how to design an allocator that will avoid fragmenting your memory pool. (But if you have that part answered, then writing a custom allocator would be the way to go.)

回复收藏 0 原文

~没有更多了~