为什么在C＆＃x2B;＆＃x2B;中找到最有效的元素？

发布于 2025-01-31 15:46:29 字数 1303 浏览 3 评论 0原文

我需要一个快速的STL容器来查找其中是否存在元素，因此我测试了数组，向量，集合和无序集。我认为，由于独特和有序的值，该集合被优化用于查找元素，但是1000万次迭代的最快是：阵列（0.3秒）向量（1.7秒）无序集（1.9秒）集合（3秒）

这是代码：

#include <algorithm>
#include <iostream>
#include <set>
#include <unordered_set>
#include <vector>

int main() {
    using std::cout, std::endl, std::set, std::unordered_set, std::vector, std::find;
    
    int i;
    
    const long ITERATIONS = 10000000;
    
    int a[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
    for (int i = 0; i < ITERATIONS; i++) {
        if (find(a, a + 16, rand() % 64) == a + 16) {}
        else {}
    }
    
    vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
    for (i = 0; i < ITERATIONS; i++) {
        if (find(v.begin(), v.end(), rand() % 64) == v.end()) {}
        else {}
    }
    
    set<int> s({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
    for (i = 0; i < ITERATIONS; i++) {
        if (find(s.begin(), s.end(), rand() % 64) == s.end()) {}
        else {}
    }
    
    unordered_set<int> us({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
    for (i = 0; i < ITERATIONS; i++) {
        if (find(us.begin(), us.end(), rand() % 64) == us.end()) {}
        else {}
    }
}

原文

I need a fast STL container for finding if an element exists in it, so I tested arrays, vectors, sets, and unordered sets. I thought that sets were optimized for finding elements, because of unique and ordered values, but the fastest for 10 million iterations are:
arrays (0.3 secs)
vectors (1.7 secs)
unordered sets (1.9 secs)
sets (3 secs)

Here is the code:

#include <algorithm>
#include <iostream>
#include <set>
#include <unordered_set>
#include <vector>

int main() {
    using std::cout, std::endl, std::set, std::unordered_set, std::vector, std::find;
    
    int i;
    
    const long ITERATIONS = 10000000;
    
    int a[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
    for (int i = 0; i < ITERATIONS; i++) {
        if (find(a, a + 16, rand() % 64) == a + 16) {}
        else {}
    }
    
    vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
    for (i = 0; i < ITERATIONS; i++) {
        if (find(v.begin(), v.end(), rand() % 64) == v.end()) {}
        else {}
    }
    
    set<int> s({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
    for (i = 0; i < ITERATIONS; i++) {
        if (find(s.begin(), s.end(), rand() % 64) == s.end()) {}
        else {}
    }
    
    unordered_set<int> us({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
    for (i = 0; i < ITERATIONS; i++) {
        if (find(us.begin(), us.end(), rand() % 64) == us.end()) {}
        else {}
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月下客 2025-02-07 15:46:29

请记住，在c和c ++中有就像规则！
这意味着，只要运行代码的可观察结果保持不变，编译器就可以通过任何方式（即使是删除代码）来转换代码。

这是 godbolt 您的代码。

现在请注意编译器对if（find（a，a + 16，rand（）％64）== a + 16）{}：

.L206:
        call    rand
        sub     ebx, 1
        jne     .L206

基本上是编译器未使用并删除的结果，一切都期望呼叫rand（）具有副作用（结果可见）。

std :: vector：

.L207:
        call    rand
        sub     ebx, 1
        jne     .L207

甚至对于std :: Set和std :: unordered_set编译器也能够执行相同的优化。您看到的差异（您没有指定您的工作方式）只是初始化所有这些变量的结果，这对于更复杂的容器来说很耗时。

编写良好的性能测试很难，应该谨慎进行。

您的问题也有第二个问题。给定代码的时间复杂性。
搜索数组和搜索std :: Set和std :: Unrodered_set缩放与数据尺寸不同。对于小数据集，简单的数组将是命运，因为它的简单实现和对内存的最佳访问。随着数据大小的增长std ::的时间复杂性::查找数组的将以o（n）在查找项目将以o（log n）和std :: unordered_set为o（log n） 它将是恒定的时间o（1）。因此，对于少量数据阵列将是最快的，对于中等大小std :: Set是赢家，如果数据数量将是大的std :: Unordered_set最好是最好的。

看看此基准示例示例使用Google-Benchmark。

Please remember that in C and C++ there is the as if rule!
This means compiler can transform code by any means (even by dropping code) as long as observable result of running code remains unchanged.

Here is godbolt of your code.

Now note what compiler did for if (find(a, a + 16, rand() % 64) == a + 16) {}:

.L206:
        call    rand
        sub     ebx, 1
        jne     .L206

Basically compiler noticed that result of it is not used and remove everything expect calling rand() which has side effects (visible changes in results).

The same happen for std::vector:

.L207:
        call    rand
        sub     ebx, 1
        jne     .L207

And even for std::set and std::unordered_set compiler was able to perform same optimization. The difference you are seeing (you didn't specified how you did that) is just result of initializing all of this variables which is time consuming for more complex containers.

Writing good performance test is hard and should be approached with caution.

There is also second problem with your question. Time complexity of given code.
Searching array and searching std::set and std::unrodered_set scales differently to size of data. For small data set simple array will be fates since its simple implementation and optimal access to memory. As data size grow time complexity of std::find for array will grow as O(n) on other hand slower std::set time to find item will grow as O(log n) and for std::unordered_set it will be constant time O(1). So for small amount of data array will be fastest, for medium size std::set is the winner and if amount of data will be large std::unordered_set will be best.

Take a look on this benchmark example which uses google benchmark.

回复收藏 0 原文