当前位置：文江博客话题详情

hash_map和map哪个更快？少于 10000 件商品

发布于 2024-07-27 02:43:27 字数 180 浏览 4 评论 0原文

vs2005支持 ::stdext::hash_map ::std::地图。

然而，在我的测试中， ::stdext::hash_map 的插入和删除操作似乎比 ::std::map 慢。（少于 10000 项）

有趣......

任何人都可以提供有关它们的比较文章吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺ぐ爱和你が 2024-08-03 02:43:27

通常你会考虑各种操作的复杂性，这是一个很好的指南：哈希图的分摊 O(1) 插入、O(1) 查找、删除，而树的 O(log N) 插入、查找、删除 -基于地图。

然而，在某些情况下，复杂性会产生误导，因为所涉及的常数项是极端的。例如，假设您的 10,000 件物品已被锁紧。进一步假设这些字符串的长度均为 100k 个字符。假设不同的字符串通常在字符串开头附近有所不同（例如，如果它们本质上是随机的，则对在第一个字节中的不同概率为 255/256）。

然后，为了进行查找，哈希映射必须对 100k 字符串进行哈希处理。集合的大小为 O(1)，但可能需要相当长的时间，因为字符串的长度可能为 O(M)。一棵平衡树必须进行 log N <= 14 次比较，但每次只需要查看几个字节。这可能根本不需要很长时间。

在内存访问方面，对于 64 字节缓存行大小，哈希图加载超过 1500 个连续行，并执行 100k 字节操作，而树加载 15 个随机行（实际上可能是 30 个，因为通过字符串进行间接寻址）并执行 14 个随机行。 *（一些少量）字节操作。您可以看到前者可能比后者慢。或者它可能会更快：您的架构的 FSB 带宽、停顿时间和推测性读取缓存有多好？

如果查找找到匹配项，那么当然除此之外，两个结构还需要执行单个全长字符串比较。此外，如果存储桶中发生冲突，哈希图可能会进行额外的失败比较。

因此，假设失败的比较快到可以忽略不计，而成功的比较和散列操作很慢，则树的速度可能大约是散列的 1.5-2 倍。如果这些假设不成立，那么它就不会成立。

当然，这是一个极端的例子，但很容易看出，在您的数据上，特定的 O(log N) 操作可能比特定的 O(1) 操作快得多。你想要测试当然是对的，但如果你的测试数据不能代表现实世界，那么你的测试结果也可能不具有代表性。基于复杂性的数据结构比较是指当 N 接近无穷大时的极限行为。但 N 并不接近无穷大。是10000。

Normally you look to the complexities of the various operations, and that's a good guide: amortized O(1) insert, O(1) lookup, delete for a hashmap as against O(log N) insert, lookup, delete for a tree-based map.

However, there are certain situations where the complexities are misleading because the constant terms involved are extreme. For example, suppose that your 10k items are keyed off strings. Suppose further that those strings are each 100k characters long. Suppose that different strings typically differ near the beginning of the string (for example if they're essentially random, pairs will differ in the first byte with probability 255/256).

Then to do a lookup the hashmap has to hash a 100k string. This is O(1) in the size of the collection, but might take quite a long time since it's probably O(M) in the length of the string. A balanced tree has to do log N <= 14 comparisons, but each one only needs to look at a few bytes. This might not take very long at all.

In terms of memory access, with a 64 byte cache line size, the hashmap loads over 1500 sequential lines, and does 100k byte operations, whereas the tree loads 15 random lines (actually probably 30 due to the indirection through the string) and does 14 * (some small number) byte operations. You can see that the former might well be slower than the latter. Or it might be faster: how good are your architecture's FSB bandwidth, stall time, and speculative read caching?

If the lookup finds a match, then of course in addition to this both structures need to perform a single full-length string comparison. Also the hashmap might do additional failed comparisons if there happens to be a collision in the bucket.

So assuming that failed comparisons are so fast as to be negligible, while successful comparisons and hashing ops are slow, the tree might be roughly 1.5-2 times as fast as the hash. If those assumptions don't hold, then it won't be.

An extreme example, of course, but it's pretty easy to see that on your data, a particular O(log N) operation might be considerably faster than a particular O(1) operation. You are of course right to want to test, but if your test data is not representative of the real world, then your test results may not be representative either. Comparisons of data structures based on complexity refer to behaviour in the limit as N approaches infinity. But N doesn't approach infinity. It's 10000.

回复收藏 0 原文