用于快速名称查找的容器
我想存储字符串并为每个字符串分配一个唯一的 ID 号(索引就可以了)。 我只需要每个字符串的一份副本,并且需要快速查找。 我经常检查该字符串是否存在于表中,以至于我注意到性能受到影响。 为此使用的最佳容器是什么以及如何查找字符串是否存在?
I want to store strings and issue each with a unique ID number (an index would be fine). I would only need one copy of each string and I require quick lookup. I check if the string exist in the table often enough that i notice a performance hit. Whats the best container to use for this and how do i lookup if the string exist?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我建议 tr1::unordered_map。 它是作为哈希图实现的,因此其查找的预期复杂度为 O(1),最坏情况为 O(n)。 如果您的编译器不支持 tr1,还有一个 boost 实现。
I would suggest tr1::unordered_map. It is implemented as a hashmap so it has an expected complexity of O(1) for lookups and a worst case of O(n). There is also a boost implementation if your compiler doesn't support tr1.
试试这个:
(来源:adrinael.net)
try this:
(source: adrinael.net)
尝试 std::map。
Try std::map.
首先您必须能够量化您的选择。 您还告诉我们,您感兴趣的主要使用模式是查找,而不是插入。
令
N
为您期望表中包含的字符串数量,并令C
为表中存在的任何给定字符串的平均字符数。所述表(或在对照表检查的字符串中)。在基于哈希的方法的情况下,对于每次查找,您需要支付以下费用: p>
O(C)
- 计算您要查找的字符串的哈希值O(1 x C)
和O(N x C)
之间,其中1..N
是您期望的成本根据哈希键遍历存储桶,此处乘以C
以根据查找键重新检查每个字符串中的字符O(2 x C)
和O((N + 1) x C)
如果是基于
std::map
的方法(使用红黑树),对于每次查找,您需要支付以下费用:O(1 x C)
和O(log(N) x C)
之间 - 其中O(log(N))< /code> 是最大树遍历成本,
O(C)
是std::map
的通用less<>
的时间> 实现需要在树遍历期间重新检查您的查找键在
N
值较大且缺乏保证少于 log(N) 冲突的哈希函数的情况下,或者如果您只是想安全起见,< strong>您最好使用基于树的 (std::map
) 方法。 如果 N 很小,无论如何,请使用基于哈希的方法(同时仍然确保哈希冲突较低。)不过,在做出任何决定之前,您还应该检查:
First and foremost you must be able to quantify your options. You have also told us that the main usage pattern you're interested in is lookup, not insertion.
Let
N
be the number of strings that you expect to be having in the table, and letC
be the average number of characters in any given string present in the said table (or in the strings that are checked against the table).In the case of a hash-based approach, for each lookup you pay the following costs:
O(C)
- calculating the hash for the string you are about to look upO(1 x C)
andO(N x C)
, where1..N
is the cost you expect from traversing the bucket based on hash key, here multiplied byC
to re-check the characters in each string against the lookup keyO(2 x C)
andO((N + 1) x C)
In the case of a
std::map
-based approach (which uses red-black trees), for each lookup you pay the following costs:O(1 x C)
andO(log(N) x C)
- whereO(log(N))
is the maximal tree traversal cost, andO(C)
is the time thatstd::map
's genericless<>
implementation takes to recheck your lookup key during tree traversalIn the case of large values for
N
and in the absence of a hash function that guarantees less than log(N) collisions, or if you just want to play it safe, you're better off using a tree-based (std::map
) approach. If N is small, by all means, use a hash-based approach (while still making sure that hash collision is low.)Before making any decision, though, you should also check:
要搜索的字符串是否静态可用? 您可能想看看完美的哈希函数
Are the Strings to be searched available statically? You might want to look at a perfect hashing function
听起来数组可以很好地工作,其中索引是数组的索引。 要检查它是否存在,只需确保索引位于数组范围内并且其条目不为 NULL。
编辑:如果您对列表进行排序,您始终可以使用应该具有快速查找功能的二分搜索。
编辑:此外,如果您想搜索字符串,也可以随时使用
std::map
。 这应该有一些不错的查找速度。sounds like an array would work just fine where the index is the index into the array. To check if it exists, just make sure the index is in bounds of the array and that its entry isn't NULL.
EDIT: if you sort the list, you could always use a binary search which should have fast lookup.
EDIT: Also, if you want to search for a string, you can always use a
std::map<std::string, int>
as well. This should have some decent lookup speeds.最简单的是使用 std::map。
它的工作原理如下:
Easiest is to use std::map.
It works like this:
Google 稀疏哈希 可能
Google sparse hash maybe