如何提高使用键类型 std::string 进行地图查找的性能?
我使用的是 std::map
(VC++ 实现),通过地图的 find 方法进行查找有点慢。
键类型是std::string
。
我可以通过地图的自定义键比较覆盖来提高此 std::map
查找的性能吗? 例如,也许 std::string
在比较数据之前,compare 没有考虑简单的
string::size()
比较?
还有其他加快比较速度的想法吗?
在我的情况下,地图将始终包含 < 15个元素,但查询不停,性能至关重要。 也许我可以使用更好的数据结构,速度会更快?
更新:地图包含文件路径。
更新2:地图的元素经常变化。
I'm using a std::map
(VC++ implementation) and it's a little slow for lookups via the map's find method.
The key type is std::string
.
Can I increase the performance of this std::map
lookup via a custom key compare override for the map? For example, maybe std::string
< compare doesn't take into consideration a simple string::size()
compare before comparing its data?
Any other ideas to speed up the compare?
In my situation the map will always contain < 15 elements, but it is being queried non stop and performance is critical. Maybe there is a better data structure that I can use that would be faster?
Update: The map contains file paths.
Update2: The map's elements are changing often.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
甚至表示
集合
中使用的运算符是<
而不是==
。如果您不关心
set
中字符串的顺序,您可以向set
传递一个自定义比较器,该比较器的性能比常规 less-than< /em>.例如,如果许多字符串具有相似的前缀(但长度不同),您可以按字符串长度排序(因为
string.length
是恒定速度)。如果您这样做,请注意一个常见错误:
此运算符不维护严格的弱排序,如它可以将两个字符串视为一个小于另一个。
按照逻辑,您将看到
comp(a, b) == true
和comp(b, a) == true
。正确的实现是:
As Even said the operator used in a
set
is<
not==
.If you don't care about the order of the strings in your
set
you can pass theset
a custom comparator that performs better than the regular less-than.For example if a lot of your strings have similar prefixes (but they vary in length) you can sort by string length (since
string.length
is constant speed).If you do so beware a common mistake:
This operator does not maintain a strict weak ordering, as it can treat two strings as each less than the other.
Follow the logic and you'll see that
comp(a, b) == true
andcomp(b, a) == true
.The correct implementation is:
首先,关闭所有分析和调试开关。 这些会极大地减慢 STL 的速度。
如果不是这样,部分问题可能是字符串的前 80-90% 是相同的。 这对于地图来说不一定是坏事,但对于字符串比较来说却是这样。 如果是这种情况,您的搜索可能需要更长的时间。
例如,在此代码中,find() 可能会导致几次字符串比较,但每次比较都会在比较第一个字符直到“david”后返回,然后将检查前三个字符。 所以每次调用最多会检查 5 个字符。
另一方面,在下面的代码中,find() 可能会检查 135 个以上的字符:
这是因为字符串比较必须更深入地搜索才能找到匹配项,因为每个字符串的开头都是相同的。
在比较相等性时使用 size() 在这里不会有太大帮助,因为您的数据集非常小。 std::map 保持排序,因此可以使用二分搜索来搜索其元素。 每次调用 find 都会导致少于 5 次字符串比较(未命中)和平均 2 次比较(命中)。 但这确实取决于您的数据。 如果大多数路径字符串的长度不同,那么像 Motti 描述的大小检查可能会有很大帮助。
在考虑替代算法时需要考虑的是您获得了多少“点击”。 大多数 find() 调用是否返回 end() 还是命中? 如果大多数 find() 返回 end() (未命中),那么您每次都会搜索整个地图(2logn 字符串比较)。
Hash_map是个好主意; 它应该可以将您的搜索时间减少大约一半; 更多的是错过。
由于路径字符串的性质,可能需要自定义算法,特别是如果您的数据集具有共同的祖先(如上面的代码所示)。
另一件需要考虑的事情是如何获取搜索字符串。 如果您重用它们,将它们编码成更容易比较的东西可能会有所帮助。 如果您使用它们一次并丢弃它们,那么这个编码步骤可能太昂贵了。
我曾经(很久以前)使用过类似霍夫曼编码树的东西来优化字符串搜索。 像这样的二叉字符串搜索树在某些情况下可能会更有效,但对于像您这样的小集合来说它相当昂贵。
最后,研究替代的 std::map 实现。 我听说过有关 VC 的某些 stl 代码性能的坏消息。 DEBUG 库尤其不能在每次调用时检查您。 StlPort 曾经是一个不错的选择,但我已经好几年没有尝试过了。 我也一直很喜欢 Boost。
First, turn off all the profiling and DEBUG switches. These can slow down STL immensely.
If that's not it, part of the problem may be that your strings are identical for the first 80-90% of the string. This isn't bad for map, necessarily, but it is for string comparisons. If this is the case, your search can take much longer.
For example, in this code find() will likely result in a couple of string compares, but each will return after comparing the first character until "david", and then the first three characters will be checked. So at most, 5 characters will be checked per call.
On the other hand, in the following code, find() will likely check 135+ characters:
That's because the string comparisons have to search deeper to find a match since the beginning of each string is the same.
Using size() in your comparison for equality won't help you much here since your data set is so small. A std::map is kept sorted so its elements can be searched with a binary search. Each call to find should result in less than 5 string comparisons for a miss, and an average of 2 comparisons for a hit. But it does depend on your data. If most of your path strings are of different lengths, then a size check like Motti describes could help a lot.
Something to consider when thinking of alternative algorithms is how many many "hits" you get. Are most of your find() calls returning end() or a hit? If most of your find()s return end() (misses) then you are searching the entire map every time (2logn string compares).
Hash_map is a good idea; it should cut your search time in about half for hits; more for misses.
A custom algorithm may be called for because of the nature of path strings, especially if your data set has common ancestry like in the above code.
Another thing to consider is how you get your search strings. If you are reusing them, it may help to encode them into something that is easier to compare. If you use them once and discard them, then this encoding step is probably too expensive.
I used something like a Huffman coding tree once (a long time ago) to optimize string searches. A binary string search tree like that may be more efficient in some cases, but its pretty expensive for small sets like yours.
Finally, look into alternative std::map implementations. I've heard bad things about some of VC's stl code performance. The DEBUG library in particular is bad about checking you on every call. StlPort used to be a good alternative, but I haven't tried it in a few years. I've always loved Boost too.
第一件事是尝试使用 hash_map 如果可能的话 - 你是对的,标准字符串比较不会首先检查大小(因为它按字典顺序比较),但编写自己的映射代码是你最好避免的事情。 从您的问题来看,您似乎不需要迭代范围; 在这种情况下,map 没有 hash_map 没有的东西。
它还取决于您的地图中拥有哪种类型的键。 它们通常很长吗? 还有“有点慢”是什么意思? 如果您还没有对代码进行概要分析,那么很可能它是一个需要时间的不同部分。
更新:嗯,程序中的瓶颈是map::find,但map 的元素总是少于15 个。 这让我怀疑这个配置文件在某种程度上具有误导性,因为在这么小的地图上找到一个东西根本不应该慢。 事实上,map::find 应该非常快,只是分析的开销可能比 find 调用本身还要多。 我不得不再问一次,你确定这真的是你程序的瓶颈吗? 你说字符串是路径,但你在这个循环中没有进行任何类型的操作系统调用、文件系统访问、磁盘访问? 其中任何一个都应该比小地图上的 map::find 慢几个数量级。 实际上,任何获取字符串的方法都应该比 map::find 慢。
The first thing is to try using a hash_map if that's possible - you are right that the standard string compare doesn't first check for size (since it compares lexicographically), but writing your own map code is something you'd be better off avoiding. From your question it sounds like you do not need to iterate over ranges; in that case map doesn't have anything hash_map doesn't.
It also depends on what sort of keys you have in your map. Are they typically very long? Also what does "a little slow" mean? If you have not profiled the code it's quite possible that it's a different part taking time.
Update: Hmm, the bottleneck in your program is a map::find, but the map always has less than 15 elements. This makes me suspect that the profile was somehow misleading, because a find on a map this small should not be slow, at all. In fact, a map::find should be so fast, just the overhead of profiling could be more than the find call itself. I have to ask again, are you sure this is really the bottleneck in your program? You say the strings are paths, but you're not doing any sort of OS calls, file system access, disk access in this loop? Any of those should be orders of magnitude slower than a map::find on a small map. Really any way of getting a string should be slower than the map::find.
您可以尝试使用排序向量(这里是一个示例),这可能事实证明速度更快(您必须 对其进行分析当然要确保)。
认为它会更快的原因:
认为它会变慢的原因:
string
的swap
非常高效并且数据集的大小很小这可能不是问题。You can try to use a sorted vector (here's one sample), this may turn out to be faster (you'll have to profile it to make sure of-course).
Reasons to think it'll be faster:
Reasons to think it'll be slower:
string
'sswap
is efficiant and the size of the data set is small this may not be an issue.std::map 的比较器不是 std::equal_to 它是 std::less,我不确定短路 a < 的最佳方法是什么 比较一下,它会比内置的更快。
如果总有< 15 个元素,也许您可以使用 std::string 之外的键?
std::map's comparator isn't std::equal_to it's std::less, I'm not sure what the best way to short circuit a < compare so that it would be faster than the built in one.
If there are always < 15 elems, perhaps you could use a key besides std::string?
莫蒂有一个很好的解决方案。 但是,我很确定对于您的< 15 个元素的映射不是正确的方法,因为它的开销始终大于具有适当哈希方案的简单查找表的开销。 在您的情况下,仅按长度进行散列可能就足够了,如果仍然产生冲突,请对相同长度的所有条目使用线性搜索。
为了确定我是否正确,当然需要一个基准,但我对其结果非常确定。
Motti has a good solution. However, I'm pretty sure that for your < 15 elements a map isn't the right way because its overhead will always be greater than that of a simple lookup table with an appropriate hashing scheme. In your case, it might even be enough to hash by length alone, and if that still produces collisions, use a linear search through all entries of the same length.
To establish if I'm right, a benchmark is of course required but I'm quite sure of its outcome.
您可能会考虑预先计算字符串的哈希值,并将其保存在映射中。 这样做可以在通过 std::map 树搜索期间提供散列比较而不是字符串比较的优势。
这样做的好处是在构造时计算一次字符串的哈希值。 之后,您可以实现一个比较函数:
由于散列现在是在
HashedString
构造上计算的,因此它们以这种方式存储在 std::map 中,因此比较可以非常快地进行(整数比较)在天文数字般高的时间比例中,当哈希值相等时,会使用标准字符串进行比较。You might consider pre-computing a hash for a string, and saving that in your map. Doing so gives the advantage of hash compares instead of string compares during the search through the std::map tree.
This has the benefits of computing a hash of the string once, on construction. After this, you could implement a comparison function:
Since hashes are now computed on
HashedString
construction, they are stored that way in the std::map, and so the compare can happen very quickly (an integer compare) in an astronomically high percentage of the time, falling back on standard string compares when the hashes are equal.也许您可以在将字符串用作地图中的键之前反转字符串? 如果每个字符串的前几个字母相同,这可能会有所帮助。
Maybe you could reverse the strings prior to using them as keys in the map? That could help if the first few letters of each string are identical.
以下是您可以考虑的一些事项:
0) 您确定这是性能瓶颈所在吗? 喜欢 Quantify、Cachegrind、gprof 或类似的结果吗? 因为在这样的 smap 映射上查找应该相当快...
1) 您可以覆盖用于比较 std::map<> 中的键的函子,有第二个模板参数可以做到这一点。 不过,我怀疑你能比 Operator< 做得更好。
2)地图内容变化大吗? 如果不是,并且考虑到映射的尺寸非常小,也许使用排序向量和二分搜索可以产生更好的结果(例如,因为您可以更好地利用内存局部性。3
)元素在编译时是否已知? 如果是这种情况,您可以使用完美的哈希函数来缩短查找时间。 在网络上搜索 gperf。
4)你是否有很多查询都没有找到任何东西? 如果是这样,也许与集合中的第一个和最后一个元素进行比较可以比每次完整搜索更快地消除许多不匹配。
这些已经被建议,但更详细:
5)由于您的字符串很少,也许您可以使用不同的密钥。 例如,你们的钥匙尺寸都一样吗? 您可以使用包含固定长度字符数组的类吗? 您可以将字符串转换为数字或仅包含数字的某些数据结构吗?
Here are some things you can consider:
0) Are you sure this is where the performance bottleneck is? Like the results from Quantify, Cachegrind, gprof or something like that? Because lookups on such a smap map should be fairly fast...
1) You can override the functor used to compare the keys in std::map<>, there is a second template parameter to do that. I doubt you can do much better than operator<, however.
2) Are the contents of the map changing a lot? If not, and given the very small size of your map, maybe using a sorted vector and binary search could yield better results (for example because you can exploit memory locality better.
3) Are the elements known at compile time? You could use a perfect hash function to improve lookup times if that is the case. Search for gperf on the web.
4) Do you have a lot of lookups that fail to find anything? If so, maybe comparing with the first and last elements in the collection may eliminate many mismatches quicker than a full search every time.
These have been suggested already, but in more detail:
5) Since you have so few strings, maybe you could use a different key. For example, are your keys all the same size? Can you use a class containing a fixed-length array of characters? Can you convert your strings to numbers or some data structure with only numbers?
根据使用案例,您还可以使用一些其他技术。 例如,我们有一个应用程序需要跟上超过一百万个不同的文件路径。 问题是有数千个对象需要保存这些文件路径的小地图。
由于向数据集添加新文件路径是一项不常见的操作,因此当将路径添加到系统时,会搜索主地图。 如果未找到路径,则会添加该路径并返回一个新的排序整数(从 1 开始)。 如果路径已经存在,则返回先前分配的整数。 然后,每个对象维护的每个映射都从基于字符串的映射转换为整数映射。 这不仅极大地提高了性能,而且由于没有太多的字符串重复副本,因此减少了内存使用量。
当然,这是一个非常具体的优化。 但当涉及到性能改进时,您经常发现自己必须针对特定问题制定量身定制的解决方案。
我讨厌字符串:) 并不是它们比较起来很慢,而是它们真的会在高性能软件上浪费你的 CPU 缓存。
Depending on the usage cases, there are some other techniques you can use. For example we had an application that needed to keep up with over a million different file paths. The problem with that there were thousands of objects that needed to keep small maps of these file paths.
Since adding new file paths to the data set was an infrequent operation, when path was added to the system, a master map was searched. If the path was not found, then it was added and a new sequenced integer (starting at 1) was returned. If the path already existed, then the previously assigned integer was returned. Then each map maintained by each object was converted from a string based map to an integer map. Not only did this greatly improve performance, it reduced memory usage by not having so many duplicate copies of the strings.
Sure, this is a very specific optimization. But when it comes to performance improvements, you often find yourself having to make tailored solutions to specific problems.
And I hate strings :) Not are they slow to compare, but they can really trash your CPU caches on high performance software.
尝试 std::tr1::unordered_map (在标头中找到)。 这是一个哈希映射,虽然它不维护元素的排序顺序,但可能比常规映射快得多。
如果您的编译器不支持 TR1,请获取更新版本。 MSVC和gcc都支持TR1,我相信大多数其他编译器的最新版本也支持。 不幸的是,许多库参考站点尚未更新,因此 TR1 仍然是一项很大程度上不为人知的技术。
我希望 C++0x 不是这样。
编辑:请注意,tr1::unordered_map 的默认哈希方法是 tr1::hash,它可能需要专门用于 UDT。
Try std::tr1::unordered_map (found in the header <tr1/unordered_map>). This is a hash map, and, while it doesn't maintain a sorted order of elements, will likely be far faster than a regular map.
If your compiler doesn't support TR1, get a newer version. MSVC and gcc both support TR1, and I believe the newest versions of most other compilers also have support. Unfortunately, a lot of the library reference sites haven't been updated, so TR1 remains a largely-unknown piece of technology.
I hope C++0x isn't the same way.
EDIT: Note that the default hashing method for tr1::unordered_map is tr1::hash, which needs to be specialized to work on a UDT, probably.
如果您有很长的公共子字符串,则 trie 可能是比 map 或 hash_map 更好的数据结构。 不过,我说“可能” - hash_map 每次查找只遍历一次键,所以应该相当快。 我不会进一步讨论它,因为其他人已经讨论过了。
如果某些键比其他键更频繁地查找,您也可以考虑使用展开树,但这当然会使最坏情况的查找比平衡树更糟糕,并且查找是变异操作,如果您正在使用,这可能对您很重要例如读写锁。
如果您更关心查找的性能而不是修改,那么使用 AVL 树可能比红黑树做得更好,我认为红黑树是 STL 实现通常用于映射的内容。 AVL 树通常具有更好的平衡性,因此平均每次查找所需的比较次数较少,但差异很小。
找到您满意的这些实现可能是一个问题。 在 Boost 主页上搜索表明他们有 splay 和 AVL 树,但没有 trie。
您在评论中提到,您从未进行过找不到任何内容的查找。 所以理论上你可以跳过最后的比较,在 15 < 的树中 2^4 个元素可以给你带来 20-25% 的加速,而无需做任何其他事情。 事实上,可能还不止于此,因为相等的字符串比较起来最慢。 是否值得仅仅为了这种优化而编写自己的容器是另一个问题。
您可能还考虑引用的局部性 - 我不知道您是否可以通过从小堆中分配键和节点来避免偶尔的页面丢失。 如果您一次只需要大约 15 个条目,那么假设文件名限制低于 256 字节,您可以确保查找期间访问的所有内容都适合单个 4k 页面(当然,除了正在查找的键之外)。 与几次页面加载相比,比较字符串可能微不足道。 但是,如果这是您的瓶颈,则必须进行大量查找,因此我猜测所有内容都相当接近 CPU。 也许值得检查一下。
另一个想法:如果您在存在大量争用的结构上使用悲观锁定(您在评论中说该程序是大规模多线程的),那么无论分析器告诉您什么(CPU 周期花费在什么代码中) ,通过有效地将您限制为 1 个核心,您可能会花费比您想象的更多的费用。 尝试读写锁?
Where you have long common substrings, a trie might be a better data structure than a map or a hash_map. I said "might", though - a hash_map already only traverses the key once per lookup, so should be fairly fast. I won't discuss it further since others already have.
You could also consider a splay tree if some keys are more frequently looked up than others, but of course this makes the worst-case lookup worse than a balanced tree, and lookups are mutating operations, which may matter to you if you're using e.g. a reader-writer lock.
If you care about the performance of lookups more than modifications, you might do better with an AVL tree than a red-black, which I think is what STL implementations generally use for map. An AVL tree is typically better balanced and so will on average require fewer comparisons per lookup, but the difference is marginal.
Finding an implementation of these that you're happy with might be an issue. A search on the Boost main page suggests they have a splay and AVL tree but not a trie.
You mentioned in a comment that you never have a lookup that fails to find anything. So you could in theory skip the final comparison, which in a tree of 15 < 2^4 elements could give you something like a 20-25% speedup without doing anything else. In fact, maybe more than that, since equal strings are the slowest to compare. Whether it's worth writing your own container just for this optimisation is another question.
You might also consider locality of reference - I don't know whether you could avoid the occasional page miss by allocating the keys and the nodes out of a small heap. If you only need about 15 entries at a time, then assuming a file name limit below 256 bytes you could ensure that everything accessed during a lookup fits into a single 4k page (apart from the key being looked up, of course). It may be that comparing the strings is insignificant compared with a couple of page loads. However, if this is your bottleneck there must be an enormous number of lookups going on, so I'd guess that everything is reasonably close to the CPU. Worth checking, maybe.
Another thought: if you are using pessimistic locking on a structure where there's a lot of contention (you said in a comment the program is massively multi-threaded) then regardless of what the profiler tells you (what code the CPU cycles are spent in), it might be costing you more than you think by effectively limiting you to 1 core. Try a reader-writer lock?
hash_map
不是标准的,请尝试使用 tr1 中提供的unordered_map
(如果您的工具链还没有,则可以在 boost 中使用)。对于少量字符串,您可能最好使用
vector
,因为map
通常以树的形式实现。hash_map
is not standard, try usingunordered_map
available in tr1 (which is available in boost if your tool chain doesn't already have it).For small numbers of strings you might be better using
vector
, asmap
is typically implemented as a tree.为什么不使用哈希表呢? boost::unordered_map 可以做到。 或者您可以推出自己的解决方案,并存储字符串的 crc 而不是字符串本身。 或者更好的是,为字符串添加#defines,并使用它们进行查找,例如,
Why don't you use a hashtable instead? boost::unordered_map could do. Or you can roll out your own solution, and store the crc of a string instead of the string itself. Or better yet, put #defines for the strings, and use those for lookup, e.g.,