针对查找进行优化的哈希图
我正在寻找一些具有固定键(在初始化期间固定)并且查找速度更快的地图。它可能不支持稍后添加/更新元素。是否有某种算法可以查找键列表并制定一个函数,以便以后查找速度更快。就我而言,键是字符串。
更新:
密钥在编译时未知。但在应用程序的初始化期间。稍后不会有任何进一步的插入,但会有大量的查找。所以我希望优化查找。
I am looking for some map which has fixed keys (fixed during initialization) and that does faster look-up. It may not support adding/updating elements later. Is there some algorithm which looks the list of keys and formulates a function so that it is faster to look-up later. In my case, keys are strings.
Update:
Keys are not known at compile time. But during initialization time of the application. There wont be any further insertions later but there will be lots of look-ups. So I want look-ups to be optimized.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
CMPH 可能就是您要找的。基本上,这是
gperf
不需要在编译时进行设置。当然,C++11 的 std::unordered_map 也可能会这样做,尽管可能会出现一些冲突。
由于您查找字符串,因此对于字符串,特里树(任何不同的特里树风格、暴击位或它们具有的任何时髦名称)也可能值得研究,特别是如果您有很多它们。有很多免费的 trie 实现可以免费使用。
Try 的优点是它们可以对字符串进行索引压缩,因此使用更少的内存,从而更有可能在缓存中保存数据。此外,访问模式的随机性较低,这也是缓存友好的。哈希表必须存储值加上哈希值,并或多或少随机地(不是随机,而是不可预测地)索引到内存中。理想情况下,特里结构/类似特里结构的结构只需要一位额外的位来区分每个节点中的键与其公共前缀。
(顺便注意,在这种情况下,O(log(N)) 很可能比 O(1) 更快,因为 big-O 不考虑类似的事情。)
CMPH may be what you're looking for. Basically this is
gperf
without requiring the set at compile-time.Though of course
std::unordered_map
as by C++11 might just do too, though possibly with a few collisions.Since you lookup strings, for strings, a trie (any of the different trie flavours, crit-bit or whatever funky names they have) may also be worthwhile to look into, especially if you have many of them. There are a lot of free trie implementations freely available.
The advantage of tries is that they can index-compress strings, so they use less memory, which has a higher likelihood of having data in cache. Also the access pattern is less random, which is also cache-friendly. A hash table must store the value plus the hash, and indexes more or less randomly (not randomly, but unpredictably) into memory. A trie/trie-like structure ideally only needs one extra bit that distinguishes a key from its common prefix in each node.
(Note by the way that O(log(N)) may quite possibly be faster than O(1) in such a case, because big-O does not consider things like that.)
请注意,这些是不同的事情:您是否需要上限,您是否需要快速的典型速率,或者您是否需要有史以来最快的查找,不问任何问题?最后一个会让你付出代价,前两个可能是相互冲突的目标。
您可以尝试根据输入创建一个完美的哈希函数(即不存在输入集冲突的函数)。这是一个以某种方式解决的问题(例如 this,此)。然而,它们通常生成源代码,并且可能花费大量时间生成哈希函数。
对此的修改将使用通用散列函数(例如移位乘加)并对合适的参数进行强力搜索。
这必须与一些字符串比较的成本进行权衡(如果您不需要整理的话,这并不是那么昂贵)。
另一种选择是使用两个不同的哈希函数 - 这会增加单次查找的成本,但与外星人窃取时钟周期相比,降级的可能性稍小一些。对于典型的字符串和像样的哈希函数来说,这不太可能是一个问题。
Note that these are distinct things: do you need an upper limit, do you need a fast typical rate, or do you need the fastest lookup ever, no questions asked? The last one will cost you, the first two ones may be conflicting goals.
You could attempt to create a perfect hash function based on the input (i.e. one that does not have collisions of the input set). This is a somehow-solved problem (e.g. this, this). However, they usually generate source code and may spend significant time generating the hash function.
A modification of this would be using a generic hash function (e.g. shift-multiply-add) and do a brute force search over suitable parameters.
This has to be traded off with the cost of a few string comparisons (which aren't that terribly expensive if you don't have to collate).
Another option is to use two distinct hash functions - this increases the cost of a single lookup but makes degradation slightly less likely than aliens stealing your clock cylces. It is rather unlikely that this would be a problem with typical strings and a decent hash function.
尝试 google-sparsehash:http://code.google.com/p/google-sparsehash/< /a>
Try google-sparsehash: http://code.google.com/p/google-sparsehash/
在类似的主题(编译时已知的项目数)中,我生成了这个:查找已知的整数键集。开销低,不需要完美的哈希。幸运的是,它是用 C 语言编写的;-)
In a similar topic ((number of) items known at compile time) , I produced this one: Lookups on known set of integer keys. Low overhead, no need for perfect hash. Fortunately, it is in C ;-)