hash_map和map哪个更快? 少于 10000 件商品

发布于 2024-07-27 02:43:27 字数 180 浏览 4 评论 0原文

vs2005支持 ::stdext::hash_map ::std::地图。

然而,在我的测试中, ::stdext::hash_map 的插入和删除操作似乎比 ::std::map 慢。 (少于 10000 项)

有趣......

任何人都可以提供有关它们的比较文章吗?

vs2005 support
::stdext::hash_map
::std::map.

however it seems ::stdext::hash_map's insert and remove OP is slower then ::std::map in my test.
( less then 10000 items)

Interesting....

Can anyone offored a comparision article about them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

溺ぐ爱和你が 2024-08-03 02:43:27

通常你会考虑各种操作的复杂性,这是一个很好的指南:哈希图的分摊 O(1) 插入、O(1) 查找、删除,而树的 O(log N) 插入、查找、删除 -基于地图。

然而,在某些情况下,复杂性会产生误导,因为所涉及的常数项是极端的。 例如,假设您的 10,000 件物品已被锁紧。 进一步假设这些字符串的长度均为 100k 个字符。 假设不同的字符串通常在字符串开头附近有所不同(例如,如果它们本质上是随机的,则对在第一个字节中的不同概率为 255/256)。

然后,为了进行查找,哈希映射必须对 100k 字符串进行哈希处理。 集合的大小为 O(1),但可能需要相当长的时间,因为字符串的长度可能为 O(M)。 一棵平衡树必须进行 log N <= 14 次比较,但每次只需要查看几个字节。 这可能根本不需要很长时间。

在内存访问方面,对于 64 字节缓存行大小,哈希图加载超过 1500 个连续行,并执行 100k 字节操作,而树加载 15 个随机行(实际上可能是 30 个,因为通过字符串进行间接寻址)并执行 14 个随机行。 *(一些少量)字节操作。 您可以看到前者可能比后者慢。 或者它可能会更快:您的架构的 FSB 带宽、停顿时间和推测性读取缓存有多好?

如果查找找到匹配项,那么当然除此之外,两个结构还需要执行单个全长字符串比较。 此外,如果存储桶中发生冲突,哈希图可能会进行额外的失败比较。

因此,假设失败的比较快到可以忽略不计,而成功的比较和散列操作很慢,则树的速度可能大约是散列的 1.5-2 倍。 如果这些假设不成立,那么它就不会成立。

当然,这是一个极端的例子,但很容易看出,在您的数据上,特定的 O(log N) 操作可能比特定的 O(1) 操作快得多。 你想要测试当然是对的,但如果你的测试数据不能代表现实世界,那么你的测试结果也可能不具有代表性。 基于复杂性的数据结构比较是指当 N 接近无穷大时的极限行为。 但 N 并不接近无穷大。 是10000。

Normally you look to the complexities of the various operations, and that's a good guide: amortized O(1) insert, O(1) lookup, delete for a hashmap as against O(log N) insert, lookup, delete for a tree-based map.

However, there are certain situations where the complexities are misleading because the constant terms involved are extreme. For example, suppose that your 10k items are keyed off strings. Suppose further that those strings are each 100k characters long. Suppose that different strings typically differ near the beginning of the string (for example if they're essentially random, pairs will differ in the first byte with probability 255/256).

Then to do a lookup the hashmap has to hash a 100k string. This is O(1) in the size of the collection, but might take quite a long time since it's probably O(M) in the length of the string. A balanced tree has to do log N <= 14 comparisons, but each one only needs to look at a few bytes. This might not take very long at all.

In terms of memory access, with a 64 byte cache line size, the hashmap loads over 1500 sequential lines, and does 100k byte operations, whereas the tree loads 15 random lines (actually probably 30 due to the indirection through the string) and does 14 * (some small number) byte operations. You can see that the former might well be slower than the latter. Or it might be faster: how good are your architecture's FSB bandwidth, stall time, and speculative read caching?

If the lookup finds a match, then of course in addition to this both structures need to perform a single full-length string comparison. Also the hashmap might do additional failed comparisons if there happens to be a collision in the bucket.

So assuming that failed comparisons are so fast as to be negligible, while successful comparisons and hashing ops are slow, the tree might be roughly 1.5-2 times as fast as the hash. If those assumptions don't hold, then it won't be.

An extreme example, of course, but it's pretty easy to see that on your data, a particular O(log N) operation might be considerably faster than a particular O(1) operation. You are of course right to want to test, but if your test data is not representative of the real world, then your test results may not be representative either. Comparisons of data structures based on complexity refer to behaviour in the limit as N approaches infinity. But N doesn't approach infinity. It's 10000.

深居我梦 2024-08-03 02:43:27

这不仅仅是插入和移除。 您必须考虑到 hash_map 与 map 中的内存分配方式不同,并且每次都必须计算正在搜索的值的哈希值。

我认为这篇 Dr.Dobbs 文章将最好地回答您的问题:

C++ STL 哈希容器和性能< /a>

It is not just about insertion and removal. You must consider that memory is allocated differently in a hash_map vs map and you every time have to calculate the hash of the value being searched.

I think this Dr.Dobbs article will answer your question best:

C++ STL Hash Containers and Performance

傲娇萝莉攻 2024-08-03 02:43:27

这取决于您的使用情况和哈希冲突。 一个是二叉树,另一个是哈希表。

理想情况下,哈希映射的插入和查找时间复杂度为 O(1),映射的时间复杂度为 O(ln n),但它假定哈希值不冲突。

It depends upon your usage and your hash collisions. One is a binary tree and the other is a hashtable.

Ideally the hash map will have O(1) insertion and lookup, and the map O(ln n), but it presumes non-clashing hashes.

濫情▎り 2024-08-03 02:43:27

hash_map 使用 哈希表,它提供几乎恒定的时间 O (1) 假设有良好的哈希函数的操作。

ma​​p 使用 BST,它提供 O(lg(n) )操作,对于 10000 个元素,即 13 个,这是非常可以接受的,

我想说还是使用 map,它更安全。

hash_map uses a hash table, something that offers almost constant time O(1) operations assuming a good hash function.

map uses a BST, it offers O(lg(n)) operations, for 10000 elements that's 13 which is very acceptable

I'd say stay with map, it's safer.

伴随着你 2024-08-03 02:43:27

哈希表的查找速度应该比二叉树(即 std::map)更快。 没有人认为它们的插入和删除速度更快。

Hash tables are supposed to be faster than binary trees (i.e. std::map) for lookup. Nobody has ever suggested that they are faster for insert and delete.

纵情客 2024-08-03 02:43:27

哈希映射将创建字符串/键的哈希以进行索引。 虽然在证明其复杂性时,它被称为 O(1),但 hash_map 会对每个插入进行冲突检测,因为字符串的哈希可以生成与另一个字符串的哈希相同的索引。 因此,哈希映射对于管理这些冲突和冲突具有复杂性。 您知道这些冲突是基于输入数据的。

但是,如果您要对结构执行大量查找,请选择 hash_map。

A hash map will create a hash of the string/key for indexing. Though while proving the complexity it is mentioned as O(1), hash_map does collision detection for every insert as a hash of a string can produce the same index as the hash of another string. A hash map hence has complexity for managing these collisions & you konw these collisions are based on the input data.

However, if you are going to perform lot of look-ups on the structure, opt for hash_map.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文