c++哈希表,其中键是字符串,值是字符串向量
我收集了大量独特的字符串(大约 500k)。每个字符串都与一个字符串向量相关联。我目前正在将这些数据存储在 a 中
map<string, vector<string> >
,并且工作正常。不过,我希望查找地图的速度比 log(n) 更快。在这些受限的情况下,如何创建支持 O(1) 查找的哈希表?看来这应该是可能的,因为我提前知道所有的密钥......并且所有的密钥都是唯一的(所以我不必考虑冲突)。
干杯!
I have a large collection of unique strings (about 500k). Each string is associated with a vector of strings. I'm currently storing this data in a
map<string, vector<string> >
and it's working fine. However I'd like the look-up into the map to be faster than log(n). Under these constrained circumstances how can I create a hashtable that supports O(1) look-up? Seems like this should be possible since I know all the keys ahead of time... and all the keys are unique (so I don't have to account for collisions).
Cheers!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以使用
boost::unordered_map
、std::tr1::unordered_map
或(在 C++0x 编译器上)std::unordered_map
创建哈希表代码>.这几乎需要零努力。 Google Sparsehash 可能速度更快,而且占用的内存也更少。 (删除可能很痛苦,但似乎您不需要这样做。)如果代码仍然不够快,您可以按照其他人的建议,利用最小完美哈希值来利用密钥的先验知识,以获得有保证的 O (1)性能。代码生成工作是否值得取决于您;将 500k 个密钥放入像 gperf 这样的工具中可能需要代码生成器。
您可能还想看看 CMPH,它通过 C 语言在运行时生成完美的哈希函数API。
You can create a hashtable with
boost::unordered_map
,std::tr1::unordered_map
or (on C++0x compilers)std::unordered_map
. That takes almost zero effort. Google sparsehash may be faster still and tends to take less memory. (Deletion can be a pain, but it seems you won't need that.)If the code is still not fast enough, you can exploit prior knowledge of the keys with a minimal perfect hash, as suggested by others, to obtain guaranteed O(1) performance. Whether the code generating effort that takes is worth it depends on you; putting 500k keys into a tool like
gperf
may take a code generator generator.You may also want to look at CMPH, which generates a perfect hash function at run-time, though through a C API.
我会考虑为您的表创建一个完美哈希函数。这将保证不会发生冲突,而解决冲突是一项昂贵的操作。还提供完美哈希函数生成器。
I would look into creating a Perfect Hash Function for your table. This will guarantee no collisions which are an expensive operation to resolve. Perfect Hash Function Generators are also available.
您正在寻找的是完美哈希。 gperf 通常用于生成这些,但我不知道它与此类的配合效果如何大量字符串集合。
What you're looking for is a Perfect Hash. gperf is often used to generate these, but I don't know how well it works with such a large collection of strings.
如果您不希望已知的密钥集合发生冲突,那么您正在寻找完美的哈希。 CMPH 库(我很抱歉,因为它是针对 C 而不是 C++)是成熟的,可以生成最小的完美哈希值相当大的数据集。
If you want no collisions for a known collection of keys you're looking for a perfect hash. The CMPH library (my apologies as it is for C rather than C++) is mature and can generate minimal perfect hashes for rather large data sets.