在 C++ 中预分配存储桶std::unordered_map
我正在使用 gnu++0x 中的 std::unordered_map
来存储大量数据。我想为大量元素预先分配空间,因为我可以限制所使用的总空间。
我希望能够做的是调用:
std::unordered_map m;
m.resize(pow(2,x));
其中 x 已知。
std::unordered_map
不支持此功能。如果可能的话,我宁愿使用 std::unordered_map ,因为它最终将成为标准的一部分。
其他一些约束:
需要可靠的 O(1) 访问和映射的变异。所需的散列和比较函数已经是非标准的并且有些昂贵。 O(log n) 突变(与 std::map 一样)成本太高。
->昂贵的哈希和比较也使得基于摊销的增长变得过于昂贵。每个额外的插入都需要这些函数进行 O(n) 次操作,这会导致算法运行时间中出现额外的二次项,因为指数存储需求需要 O(n) 增长。
I'm using the std::unordered_map
from gnu++0x to store a huge amount of data. I want to pre-allocate space for the large number of elements, since I can bound the total space used.
What I would like to be able to do is call:
std::unordered_map m;
m.resize(pow(2,x));
where x is known.
std::unordered_map
doesn't support this. I would rather use std::unordered_map
if possible, since it will eventually be part of the standard.
Some other constraints:
Need reliable O(1) access and mutation of the map. The desired hash and comparison functions are already non-standard and somewhat expensive. O(log n) mutation (as with std::map
) is too expensive.
-> The expensive hash and comparison also make amortization-based growth way too expensive. Each extra insert requires O(n) operations from those functions, which results in an extra quadratic term in the algorithm's run time, since the exponential storage requirements need O(n) growths.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
if
pow(2, x)
是您想要预分配的存储桶数量。您还可以:但现在
pow(2, x)
是您计划插入的元素数量。这两个函数除了预分配存储桶之外什么都不做。他们不插入任何元素。它们都旨在完全适合您的用例。注意:无法保证您获得准确的
pow(2, x)
存储桶。一些实现将仅使用2的幂数的桶。其他实现将仅使用质数个桶。还有一些人将仅使用素数的子集来表示桶的数量。但无论如何,实现都应该接受您对所需存储桶数量的提示,然后在内部舍入到下一个可接受的存储桶数量。以下是最新版本 (N4660) 用于指定
rehash
参数的精确措辞:a.rehash(n)
: 后置条件: <代码>a.bucket_count() >= a.size() / a.max_load_factor() 和 a.bucket_count() >= n。此后置条件确保
bucket()_count() >= n
,并且load_factor()
保持小于或等于max_load_factor()
。随后
reserve(n)
根据rehash(n)
进行定义:a.reserve(n)
:与a 相同。重新散列(ceil(n / a.max_load_factor()))
。if
pow(2, x)
is the number of buckets you want preallocated. You can also:but now
pow(2, x)
is the number of elements you are planning on inserting. Both functions do nothing but preallocate buckets. They don't insert any elements. And they are both meant to be used exactly for your use case.Note: You aren't guaranteed to get exactly
pow(2, x)
buckets. Some implementations will use only a number of buckets which is a power of 2. Other implementations will use only a prime number of buckets. Still others will use only a subset of primes for the number of buckets. But in any case, the implementation should accept your hint at the number of buckets you desire, and then internally round up to its next acceptable number of buckets.Here is the precise wording that the latest (N4660) uses to specify the argument to
rehash
:a.rehash(n)
: Postconditions:a.bucket_count() >= a.size() / a.max_load_factor() and a.bucket_count() >= n
.This postcondition ensures that
bucket()_count() >= n
, and thatload_factor()
remains less than or equal tomax_load_factor()
.Subsequently
reserve(n)
is defined in terms ofrehash(n)
:a.reserve(n)
: Same asa.rehash(ceil(n / a.max_load_factor()))
.我建议为
std::unordered_map
编写自己的分配器,它完全按照您想要的方式分配内存。I would suggest writing your own allocator for the
std::unordered_map
that allocates memory exactly in the way you want.我认为无序映射具有预先分配的内存并不重要。 STL 的摊销插入时间预计为 O(n)。在我看来,在您知道这是代码的瓶颈之前,您可以省去编写自己的分配器的麻烦。
I don't think it matters for an unordered map to have pre-allocated memory. The STL is expected to be O(n) amortized insertion time. Save yourself the hassle of writing your own allocator until you know this is the bottle neck of your code, in my opinion.
构造函数根据 http://en.cppreference 采用参数“size_type bucket_count”。 com/w/cpp/container/unordered_map/unordered_map
因此,执行示例代码所说的最简单的方法是:
这会更有效,因为未定义有多少存储桶将在构造时保留,否则,它可能必须分配,然后在您调用保留时取消分配。
The constructor takes a parameter "size_type bucket_count" according to http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map
so the simplest way to do what your example code says is:
This will be more efficient since it's undefined how many buckets will be reserved on construction otherwise, it may have to allocate and then deallocate when you call reserve afterwards.
我认为只有当您事先知道映射值将占用多少内存时,rehash 和 reserve 才有效。如果映射的值很复杂或大小动态变化(例如向量),您将需要自己的实现。例如,如果您的内存大小允许,您可以保留可能存在的最大容器。
I think rehash and reserve both work only if you know in advance how much memory your mapped value will take. If the mapped value is complicated or dynamically changes in size (e.g. a vector), you will need your own implementation. For example, if your memory size allows, you may reserve the biggest container that may ever happen to exist.