当前位置：文江博客话题详情

c++ 中 map 和 unordered_map 的性能差异

发布于 2024-08-22 21:00:41 字数 1151 浏览 4 评论 0 原文

我有一个简单的要求，我需要一张类型的地图。但是我需要理论上最快的检索时间。

我使用了 map 和 tr1 中新提出的 unordered_map 我发现至少在解析文件和创建地图时，通过一次插入一个元素。

map 只花了 2 分钟，而 unordered_map 花了 5 分钟。

由于它将成为在 Hadoop 集群上执行的代码的一部分，并且将包含约 1 亿个条目，因此我需要尽可能短的检索时间。

还有另一个有用的信息：目前正在插入的数据（键）的整数范围是从 1,2,... 到 ~1000 万。

我还可以强制用户指定最大值并使用上述顺序，这会显着影响我的实现吗？（我听说地图是基于 rb 树，并且按递增顺序插入会带来更好的性能（或最差？））

这里是代码

map<int,int> Label // this is being changed to unordered_map  
fstream LabelFile("Labels.txt");  


// Creating the map from the Label.txt  
if (LabelFile.is_open())  
{  
    while (! LabelFile.eof() )  
    {             
        getline (LabelFile,inputLine);  
        try  
        {  
            curnode=inputLine.substr(0,inputLine.find_first_of("\t"));  
            nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);  
            Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());  
        }  
        catch(char* strerr)  
        {  
            failed=true;  
            break;  
        }  
    }  
    LabelFile.close(); 
}

暂定解决方案：在审查评论和答案后，我相信动态 C++ 数组将是最好的选择，因为该实现将使用密集密钥。谢谢

原文

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.

i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.

map took only 2 minutes while unordered_map took 5 mins.

As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.

Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.

I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )

here is the code

map<int,int> Label // this is being changed to unordered_map  
fstream LabelFile("Labels.txt");  


// Creating the map from the Label.txt  
if (LabelFile.is_open())  
{  
    while (! LabelFile.eof() )  
    {             
        getline (LabelFile,inputLine);  
        try  
        {  
            curnode=inputLine.substr(0,inputLine.find_first_of("\t"));  
            nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);  
            Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());  
        }  
        catch(char* strerr)  
        {  
            failed=true;  
            break;  
        }  
    }  
    LabelFile.close(); 
}

Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绿光 2024-08-29 21:00:41

unordered_map 的插入应为 O(1)，检索应大致为 O(1)，（本质上是哈希表）。

结果，您的计时方式关闭，或者您的 unordered_map 的实现或使用存在错误。

您需要提供更多信息，以及可能如何使用该容器。

根据 n1836 第 6.3 节，给出了插入/检索的复杂性：

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf

您应该考虑的一个问题是您的实施可能需要不断地重新哈希结构，正如您所说，您有1亿+个项目。在这种情况下，在实例化容器时，如果您大致了解将在容器中插入多少“唯一”元素，则可以将其作为参数传递给构造函数，容器将相应地用适当大小的桶表进行实例化。

回复收藏 0 原文

是你 2024-08-29 21:00:41

加载 unordered_map 的额外时间是由于动态调整数组大小造成的。调整大小计划是在表超过其负载系数时将每个单元格的数量加倍。因此，从空表中，整个数据表的副本预计为 O(lg n) 份。您可以通过预先调整哈希表的大小来消除这些额外的副本。具体来说，

Label.reserve(expected_number_of_entries / Label.max_load_factor());

除以 max_load_factor 是为了考虑哈希表运行所需的空单元。

The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically

Label.reserve(expected_number_of_entries / Label.max_load_factor());

Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.

回复收藏 0 原文

黎歌 2024-08-29 21:00:41

unordered_map（至少在大多数实现中）提供快速检索，但与 map 相比插入速度相对较差。当数据随机排序时，一棵树通常处于最佳状态，而当数据有序时，一棵树处于最差状态（您不断地在树的一端插入，增加了重新平衡的频率）。

鉴于总条目数约为 1000 万个，您可以分配一个足够大的数组，并获得非常快速的查找 - 假设有足够的物理内存，不会导致系统抖动，但按照现代标准，这并不是一个巨大的内存量。

编辑：是的，向量基本上是一个动态数组。

Edit2：您添加的代码存在一些问题。您的 while (!LabelFile.eof() ) 已损坏。您通常想要执行类似 while (LabelFile >> inputdata) 的操作。您读取数据的效率也有些低——您显然期望的是由制表符分隔的两个数字。既然如此，我会编写如下循环：

while (LabelFile >> node >> label)
    Label[node] = label;

unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).

Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.

Edit: yes, a vector is basically a dynamic array.

Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:

while (LabelFile >> node >> label)
    Label[node] = label;

回复收藏 0 原文

~没有更多了~

关于作者

还不是爱你

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

c++ 中 map 和 unordered_map 的性能差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

alipaysp_b7SHQYtKbj

疯到世界奔溃

笨死的猪

西瑶

灵芸

故事与诗

友情链接

c++ 中 map 和 unordered_map 的性能差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

alipaysp_b7SHQYtKbj

疯到世界奔溃

笨死的猪

西瑶

灵芸

故事与诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。