在 C++ 中预分配存储桶std::unordered_map

发布于 2024-11-05 12:33:06 字数 501 浏览 4 评论 0原文

我正在使用 gnu++0x 中的 std::unordered_map 来存储大量数据。我想为大量元素预先分配空间，因为我可以限制所使用的总空间。

我希望能够做的是调用：

std::unordered_map m;
m.resize(pow(2,x));

其中 x 已知。

std::unordered_map 不支持此功能。如果可能的话，我宁愿使用 std::unordered_map ，因为它最终将成为标准的一部分。

其他一些约束：

需要可靠的 O(1) 访问和映射的变异。所需的散列和比较函数已经是非标准的并且有些昂贵。 O(log n) 突变（与 std::map 一样）成本太高。

->昂贵的哈希和比较也使得基于摊销的增长变得过于昂贵。每个额外的插入都需要这些函数进行 O(n) 次操作，这会导致算法运行时间中出现额外的二次项，因为指数存储需求需要 O(n) 增长。

原文

I'm using the std::unordered_map from gnu++0x to store a huge amount of data. I want to pre-allocate space for the large number of elements, since I can bound the total space used.

What I would like to be able to do is call:

std::unordered_map m;
m.resize(pow(2,x));

where x is known.

std::unordered_map doesn't support this. I would rather use std::unordered_map if possible, since it will eventually be part of the standard.

Some other constraints:

Need reliable O(1) access and mutation of the map. The desired hash and comparison functions are already non-standard and somewhat expensive. O(log n) mutation (as with std::map) is too expensive.

-> The expensive hash and comparison also make amortization-based growth way too expensive. Each extra insert requires O(n) operations from those functions, which results in an extra quadratic term in the algorithm's run time, since the exponential storage requirements need O(n) growths.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲歌长辞 2024-11-12 12:33:06

m.rehash(pow(2,x));

if pow(2, x) 是您想要预分配的存储桶数量。您还可以：

m.reserve(pow(2,x));

但现在 pow(2, x) 是您计划插入的元素数量。这两个函数除了预分配存储桶之外什么都不做。他们不插入任何元素。它们都旨在完全适合您的用例。

注意：无法保证您获得准确的 pow(2, x) 存储桶。一些实现将仅使用2的幂数的桶。其他实现将仅使用质数个桶。还有一些人将仅使用素数的子集来表示桶的数量。但无论如何，实现都应该接受您对所需存储桶数量的提示，然后在内部舍入到下一个可接受的存储桶数量。

以下是最新版本 (N4660) 用于指定 rehash 参数的精确措辞：

a.rehash(n) : 后置条件： <代码>a.bucket_count() >= a.size() / a.max_load_factor() 和 a.bucket_count() >= n。

此后置条件确保 bucket()_count() >= n，并且 load_factor() 保持小于或等于 max_load_factor() 。

随后 reserve(n) 根据 rehash(n) 进行定义：

a.reserve(n) ：与 a 相同。重新散列（ceil（n / a.max_load_factor（）））。

m.rehash(pow(2,x));

if pow(2, x) is the number of buckets you want preallocated. You can also:

m.reserve(pow(2,x));

but now pow(2, x) is the number of elements you are planning on inserting. Both functions do nothing but preallocate buckets. They don't insert any elements. And they are both meant to be used exactly for your use case.

Note: You aren't guaranteed to get exactly pow(2, x) buckets. Some implementations will use only a number of buckets which is a power of 2. Other implementations will use only a prime number of buckets. Still others will use only a subset of primes for the number of buckets. But in any case, the implementation should accept your hint at the number of buckets you desire, and then internally round up to its next acceptable number of buckets.

Here is the precise wording that the latest (N4660) uses to specify the argument to rehash:

a.rehash(n) : Postconditions: a.bucket_count() >= a.size() / a.max_load_factor() and a.bucket_count() >= n.

This postcondition ensures that bucket()_count() >= n, and that load_factor() remains less than or equal to max_load_factor().

Subsequently reserve(n) is defined in terms of rehash(n):

a.reserve(n) : Same as a.rehash(ceil(n / a.max_load_factor())).

回复收藏 0 原文

从﹋此江山别 2024-11-12 12:33:06

我建议为 std::unordered_map 编写自己的分配器，它完全按照您想要的方式分配内存。

回复收藏 0 原文

谎言月老 2024-11-12 12:33:06

我认为无序映射具有预先分配的内存并不重要。 STL 的摊销插入时间预计为 O(n)。在我看来，在您知道这是代码的瓶颈之前，您可以省去编写自己的分配器的麻烦。

回复收藏 0 原文

情绪失控 2024-11-12 12:33:06

构造函数根据 http://en.cppreference 采用参数“size_type bucket_count”。 com/w/cpp/container/unordered_map/unordered_map

因此，执行示例代码所说的最简单的方法是：

std::unordered_map m{ pow(2,x) };

这会更有效，因为未定义有多少存储桶将在构造时保留，否则，它可能必须分配，然后在您调用保留时取消分配。

The constructor takes a parameter "size_type bucket_count" according to http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map

so the simplest way to do what your example code says is:

std::unordered_map m{ pow(2,x) };

This will be more efficient since it's undefined how many buckets will be reserved on construction otherwise, it may have to allocate and then deallocate when you call reserve afterwards.

回复收藏 0 原文