在 C++ 中使用 HashMap 的最佳方法是什么?
我知道 STL 有 HashMap API,但我找不到任何好的、完整的文档以及与此相关的好示例。
任何好的例子将不胜感激。
I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.
Any good examples will be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
标准库包括有序和无序映射(
std::map
和std::unordered_map
)容器。在有序映射 (std::map
) 中,元素按键排序,插入和访问位于 O(log n)。通常标准库内部使用 红黑树 来表示有序映射。但这只是一个实现细节。在无序映射 (std::unordered_map
) 中,插入和访问的时间复杂度为 O(1)。它只是哈希表的另一个名称。(有序)
std::map
的示例:输出:
如果您需要在容器中进行排序并且适合 O(log n) 运行时,那么只需使用
std::map.
否则,如果您确实需要哈希表(O(1) 插入/访问),请查看
std::unordered_map
,它与std::map
类似API(例如,在上面的示例中,您只需搜索map
并将其替换为unordered_map
)。unordered_map
容器是随 C++11 标准 修订。因此,根据您的编译器,您必须启用 C++11 功能(例如,当使用 GCC 4.8 时,您必须将-std=c++11
添加到 CXXFLAGS)。即使在 C++11 版本之前,GCC 也支持
unordered_map
- 在命名空间std::tr1
中。因此,对于旧的 GCC 编译器,您可以尝试像这样使用它:它也是 boost 的一部分,即您可以使用相应的 boost-header 以获得更好的可移植性。
The standard library includes the ordered and the unordered map (
std::map
andstd::unordered_map
) containers. In an ordered map (std::map
) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map
) insert and access is in O(1). It is just another name for a hashtable.An example with (ordered)
std::map
:Output:
If you need ordering in your container and are fine with the O(log n) runtime then just use
std::map
.Otherwise, if you really need a hash-table (O(1) insert/access), check out
std::unordered_map
, which has a similar tostd::map
API (e.g. in the above example you just have to search and replacemap
withunordered_map
).The
unordered_map
container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add-std=c++11
to the CXXFLAGS).Even before the C++11 release GCC supported
unordered_map
- in the namespacestd::tr1
. Thus, for old GCC compilers you can try to use it like this:It is also part of boost, i.e. you can use the corresponding boost-header for better portability.
hash_map
是出于标准化目的而称为unordered_map
的较旧的非标准化版本(最初在 TR1 中,自 C++11 起包含在标准中)。顾名思义,它与std::map
的不同之处主要在于它是无序的——例如,如果您迭代从begin()
到的映射end()
中,您可以按键 1 按顺序获取项目,但如果您从begin()
迭代一个unordered_map
到end()
,您以或多或少任意的顺序获得项目。unordered_map
通常预计具有恒定的复杂性。也就是说,插入、查找等通常基本上花费固定量的时间,而不管表中有多少项。std::map
的复杂性与所存储的项目数量成对数 - 这意味着插入或检索项目的时间会增加,但速度相当缓慢,因为地图变大。例如,如果查找 100 万个项目之一需要 1 微秒,那么您可以预期查找 200 万个项目之一需要大约 2 微秒,查找 400 万个项目之一需要 3 微秒,查找 800 万个项目之一需要 4 微秒但从实际角度来看,这并不是故事的全部。从本质上讲,简单的哈希表具有固定的大小。使其适应通用容器的可变大小要求有些不简单。因此,(可能)增加表的操作(例如,插入)可能相对较慢(也就是说,大多数操作相当快,但有时会慢得多)。查找不能改变表的大小,通常要快得多。因此,与插入次数相比,当您进行大量查找时,大多数基于哈希的表往往会处于最佳状态。对于插入大量数据,然后遍历表一次以检索结果的情况(例如,计算文件中唯一单词的数量),
std::map
可能只是一样快,而且很可能甚至更快(但是,同样,计算复杂性是不同的,因此这也可能取决于文件中唯一单词的数量)。1 其中顺序由创建地图时的第三个模板参数定义,默认情况下为
std::less
。A
hash_map
is an older, unstandardized version of what for standardization purposes is called anunordered_map
(originally in TR1, and included in the standard since C++11). As the name implies, it's different fromstd::map
primarily in being unordered -- if, for example, you iterate through a map frombegin()
toend()
, you get items in order by key1, but if you iterate through anunordered_map
frombegin()
toend()
, you get items in a more or less arbitrary order.An
unordered_map
is normally expected to have constant complexity. That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. Anstd::map
has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly, as the map grows larger. For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc.From a practical viewpoint, that's not really the whole story though. By nature, a simple hash table has a fixed size. Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. As a result, operations that (potentially) grow the table (e.g., insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). Lookups, which cannot change the size of the table, are generally much faster. As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. For situations where you insert a lot of data, then iterate through the table once to retrieve results (e.g., counting the number of unique words in a file) chances are that an
std::map
will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file).1 Where the order is defined by the third template parameter when you create the map,
std::less<T>
by default.这是一个更完整和灵活的示例,它没有省略生成编译错误所需的包含:
对于键仍然不是特别有用,除非它们被预定义为指针,因为匹配值不会这样做! (但是,由于我通常使用字符串作为键,因此在键的声明中用“string”替换“const void *”应该可以解决这个问题。)
Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors:
Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.)
std::unordered_map
在 GCC stdlibc++ 6.4 中使用哈希映射的证据这在以下位置提到:https://stackoverflow.com/a/3578247/895245,但在以下答案中:C++ 中的 std::map 内部有什么数据结构? 我已经通过以下方式为 GCC stdlibc++ 6.4 实现提供了进一步的证据:
以下是该答案中描述的性能特征图的预览:
因此我们可以清楚地看到
std::unordered_set
具有几乎恒定的时间访问,除了当大小加倍并且底层动态数组必须在内存中重新定位时,会出现一些非常慢的点。如何将自定义类和哈希函数与
unordered_map
结合使用这个答案说明了这一点:使用自定义类类型作为键的 C++ unordered_map
摘录:相等:
哈希函数:
Evidence that
std::unordered_map
uses a hash map in GCC stdlibc++ 6.4This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by:
Here is a preview of the performance characteristic graph described in that answer:
So we can clearly see that
std::unordered_set
has almost constant time access, except for a few extremely slow points when the size doubles and the underlying dynamic array has to be relocated in memory.How to use a custom class and hash function with
unordered_map
This answer nails it: C++ unordered_map using a custom class type as the key
Excerpt: equality:
Hash function:
对于我们这些试图弄清楚如何在仍然使用标准模板的同时对我们自己的类进行哈希处理的人来说,有一个简单的解决方案:
在您的类中,您需要定义一个相等运算符重载
==
。如果您不知道如何执行此操作,GeeksforGeeks 有一个很棒的教程 https://www .geeksforgeeks.org/operator-overloading-c/在标准命名空间下,声明一个名为 hash 的模板结构,并将您的类名作为类型(见下文)。我发现了一篇很棒的博文,其中还展示了使用 XOR 和位移位计算哈希值的示例,但这超出了这个问题的范围,但它还包含有关如何使用哈希函数完成的详细说明 https://prateekvjoshi.com/2014/06/05 /using-hash-function-in-c-for-user-defined-classes/
std::map 或
std::unordered_map
就像您通常所做的那样并使用my_type
作为键,标准库将自动使用您之前定义的哈希函数(在步骤 2 中) )来散列你的密钥。For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:
In your class you need to define an equality operator overload
==
. If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/Under the standard namespace, declare a template struct called hash with your classname as the type (see below). I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/
std::map
orstd::unordered_map
just like you would normally do and usemy_type
as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.