使用 vector::iterator 或 at() 迭代 STL 向量哪个更快?

发布于 2024-07-17 10:34:10 字数 417 浏览 2 评论 0原文

就性能而言,什么会运行得更快? 有区别吗? 它依赖于平台吗?

//1. Using vector<string>::iterator:
vector<string> vs = GetVector();

for(vector<string>::iterator it = vs.begin(); it != vs.end(); ++it)
{
   *it = "Am I faster?";
}

//2. Using size_t index:
for(size_t i = 0; i < vs.size(); ++i)
{
   //One option:
   vs.at(i) = "Am I faster?";
   //Another option:
   vs[i] = "Am I faster?";
}

In terms of performance, what would work faster? Is there a difference? Is it platform dependent?

//1. Using vector<string>::iterator:
vector<string> vs = GetVector();

for(vector<string>::iterator it = vs.begin(); it != vs.end(); ++it)
{
   *it = "Am I faster?";
}

//2. Using size_t index:
for(size_t i = 0; i < vs.size(); ++i)
{
   //One option:
   vs.at(i) = "Am I faster?";
   //Another option:
   vs[i] = "Am I faster?";
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(17

再可℃爱ぅ一点好了 2024-07-24 10:34:10

使用迭代器会导致指针递增(用于递增),并且取消引用会导致取消引用指针。
使用索引,递增应该同样快,但查找元素涉及加法(数据指针+索引)并取消引用该指针,但差异应该很小。
at() 还会检查索引是否在范围内,因此可能会更慢。

500M 迭代、向量大小 10、gcc 4.3.3 (-O3)、linux 2.6.29.1 x86_64 的基准测试结果:
at():9158ms
运算符[]:4269ms
迭代器:3914ms

YMMV,但是如果使用索引使代码更具可读性/可理解性,那么您应该这样做。

2021 更新

对于现代编译器,所有选项实际上都是免费的,但迭代器的迭代效果稍好一些,并且更易于与 range-for 循环一起使用 (for(auto& x: vs))。

代码:

#include <vector>

void iter(std::vector<int> &vs) {
    for(std::vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
        *it = 5;
}

void index(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs[i] = 5;
}

void at(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs.at(i) = 5;
}

index()at() 生成的程序集是相同的 ([godbolt])(https://godbolt.org/z/cv6Kv4b6f),但 iter() 的循环设置短了三个指令:

iter(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        cmp     rax, rdx
        je      .L1
.L3:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rax, rdx
        jne     .L3
.L1:
        ret
index(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        sub     rdx, rax
        mov     rcx, rdx
        shr     rcx, 2
        je      .L6
        add     rdx, rax
.L8:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rdx, rax
        jne     .L8
.L6:
        ret

Using an iterator results in incrementing a pointer (for incrementing) and for dereferencing into dereferencing a pointer.
With an index, incrementing should be equally fast, but looking up an element involves an addition (data pointer+index) and dereferencing that pointer, but the difference should be marginal.
at() also checks if the index is within the bounds, so it could be slower.

Benchmark results for 500M iterations, vector size 10, with gcc 4.3.3 (-O3), linux 2.6.29.1 x86_64:
at(): 9158ms
operator[]: 4269ms
iterator: 3914ms

YMMV, but if using an index makes the code more readable/understandable, you should do it.

2021 update

With modern compilers, all options are practically free, but iterators are very slightly better for iterating and easier to use with range-for loops (for(auto& x: vs)).

Code:

#include <vector>

void iter(std::vector<int> &vs) {
    for(std::vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
        *it = 5;
}

void index(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs[i] = 5;
}

void at(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs.at(i) = 5;
}

The generated assembly for index() and at() is identical ([godbolt])(https://godbolt.org/z/cv6Kv4b6f), but the loop setup for iter() is three instructions shorter:

iter(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        cmp     rax, rdx
        je      .L1
.L3:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rax, rdx
        jne     .L3
.L1:
        ret
index(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        sub     rdx, rax
        mov     rcx, rdx
        shr     rcx, 2
        je      .L6
        add     rdx, rax
.L8:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rdx, rax
        jne     .L8
.L6:
        ret
寄意 2024-07-24 10:34:10

为什么不写一个测试来找出答案呢?

编辑:我的错 - 我以为我正在计时优化版本,但事实并非如此。 在我的机器上,使用 g++ -O2 编译,迭代器版本比运算符[] 版本稍微,但可能并不明显。

#include <vector>
#include <iostream>
#include <ctime>
using namespace std;

int main() {
    const int BIG = 20000000;
    vector <int> v;
    for ( int i = 0; i < BIG; i++ ) {
        v.push_back( i );
    }

    int now = time(0);
    cout << "start" << endl;
    int n = 0;
    for(vector<int>::iterator it = v.begin(); it != v.end(); ++it) {
        n += *it;
    }

    cout << time(0) - now << endl;
    now = time(0);
    for(size_t i = 0; i < v.size(); ++i) {
        n += v[i];
    }
    cout << time(0) - now << endl;

    return n != 0;
}

Why not write a test and find out?

Edit: My bad - I thought I was timing the optimised version but wasn't. On my machine, compiled with g++ -O2, the iterator version is slightly slower than the operator[] version, but probably not significantly so.

#include <vector>
#include <iostream>
#include <ctime>
using namespace std;

int main() {
    const int BIG = 20000000;
    vector <int> v;
    for ( int i = 0; i < BIG; i++ ) {
        v.push_back( i );
    }

    int now = time(0);
    cout << "start" << endl;
    int n = 0;
    for(vector<int>::iterator it = v.begin(); it != v.end(); ++it) {
        n += *it;
    }

    cout << time(0) - now << endl;
    now = time(0);
    for(size_t i = 0; i < v.size(); ++i) {
        n += v[i];
    }
    cout << time(0) - now << endl;

    return n != 0;
}
这样的小城市 2024-07-24 10:34:10

由于您正在考虑效率,因此您应该意识到以下变体可能更有效:

//1. Using vector<string>::iterator:

vector<string> vs = GetVector();
for(vector<string>::iterator it = vs.begin(), end = vs.end(); it != end; ++it)
{
   //...
}

//2. Using size_t index:

vector<string> vs = GetVector();
for(size_t i = 0, size = vs.size(); i != size; ++i)
{
   //...
}

因为 end/size 函数仅调用一次,而不是每次循环时调用。 编译器很可能无论如何都会内联这些函数,但这种方式可以确保。

Since you're looking at efficiency, you should realise that the following variations are potentially more efficient:

//1. Using vector<string>::iterator:

vector<string> vs = GetVector();
for(vector<string>::iterator it = vs.begin(), end = vs.end(); it != end; ++it)
{
   //...
}

//2. Using size_t index:

vector<string> vs = GetVector();
for(size_t i = 0, size = vs.size(); i != size; ++i)
{
   //...
}

since the end/size function is only called once rather than every time through the loop. It's likely that the compiler will inline these functions anyway, but this way makes sure.

鱼窥荷 2024-07-24 10:34:10

如果不需要索引,请不要使用它。 迭代器概念可以为您提供最好的帮助。 迭代器非常容易优化,而直接访问则需要一些额外的知识。

索引用于直接访问。 括号和 at 方法可以执行此操作。 at[] 不同,会检查越界索引,因此速度会更慢。

信条是:不要索取你不需要的东西。 那么编译器就不会因为你不使用的东西向你收费。

If you don't need indexing, don't use it. The iterator concept is there for your best. Iterators are very easy to optimize, while direct access needs some extra knowledge.

Indexing is meant for direct access. The brackets and the at method do this. at will, unlike [], check for out of bounds indexing, so it will be slower.

The credo is: don't ask for what you don't need. Then the compiler won't charge you for what you don't use.

朮生 2024-07-24 10:34:10

正如这里其他人所说的那样,进行基准测试。

话虽如此,我认为迭代器更快,因为 at() 也进行范围检查,即如果索引越界,它会抛出 out_of_range 异常。 该检查本身可能会产生一些开销。

As everyone else here is saying, do benchmarks.

Having said that, I would argue that the iterator is faster since at() does range checking as well, i.e. it throws an out_of_range exception if the index is out of bounds. That check itself propbably incurrs some overhead.

坏尐絯℡ 2024-07-24 10:34:10

我猜第一个变体更快。

但这取决于实现。 为确保您应该分析您自己的代码。

为什么要分析您自己的代码?

因为这些因素都会改变结果:

  • 哪个操作系统
  • 哪个编译器
  • 正在使用哪种 STL 实现
  • 是否打开了优化?
  • ...(其他因素)

I would guess the first variant is faster.

But it's implementation dependent. To be sure you should profile your own code.

Why profile your own code?

Because these factors will all vary the results:

  • Which OS
  • Which compiler
  • Which implementation of STL was being used
  • Were optimizations turned on?
  • ... (other factors)
你爱我像她 2024-07-24 10:34:10

这实际上取决于您在做什么,但如果您必须不断重新声明迭代器,迭代器的速度会变得稍慢。 在我的测试中,最快的迭代可能是向向量数组声明一个简单的 * 并对其进行迭代。

例如:

向量迭代和每次传递两个函数。

vector<MyTpe> avector(128);
vector<MyTpe>::iterator B=avector.begin();
vector<MyTpe>::iterator E=avector.end()-1;
for(int i=0; i<1024; ++i){
 B=avector.begin();
   while(B!=E)
   {
       float t=B->GetVal(Val1,12,Val2); float h=B->GetVal(Val1,12,Val2);
    ++B;
  }}

Vector 花费了 90 次点击(0.090000 秒),

但是如果您使用指针进行操作...

for(int i=0; i<1024; ++i){
MyTpe *P=&(avector[0]);
   for(int i=0; i<avector.size(); ++i)
   {
   float t=P->GetVal(Val1,12,Val2); float h=P->GetVal(Val1,12,Val2);
   }}

Vector 花费了 18 次点击(0.018000 秒),

这大致相当于...

MyTpe Array[128];
for(int i=0; i<1024; ++i)
{
   for(int p=0; p<128; ++p){
    float t=Array[p].GetVal(Val1, 12, Val2); float h=Array[p].GetVal(Val2,12,Val2);
    }}

Array 花费了 15 次点击(0.015000 秒)。

如果消除对 avector.size() 的调用,时间就会变得相同。

最后,使用 [ ]

for(int i=0; i<1024; ++i){
   for(int i=0; i<avector.size(); ++i){
   float t=avector[i].GetVal(Val1,12,Val2); float h=avector[i].GetVal(Val1,12,Val2);
   }}

Vector 调用需要 33 次点击(0.033000 秒),

用 Clock() 计时

It really depends on what you are doing, but if you have to keep re-declaring the iterator, Iterators become MARGINALLY SLOWER. In my tests, the fastest possible iteration would be to declare a simple * to your vectors array and Iterate through that.

for example:

Vector Iteration and pulling two functions per pass.

vector<MyTpe> avector(128);
vector<MyTpe>::iterator B=avector.begin();
vector<MyTpe>::iterator E=avector.end()-1;
for(int i=0; i<1024; ++i){
 B=avector.begin();
   while(B!=E)
   {
       float t=B->GetVal(Val1,12,Val2); float h=B->GetVal(Val1,12,Val2);
    ++B;
  }}

Vector Took 90 clicks (0.090000 seconds)

But if you did it with pointers...

for(int i=0; i<1024; ++i){
MyTpe *P=&(avector[0]);
   for(int i=0; i<avector.size(); ++i)
   {
   float t=P->GetVal(Val1,12,Val2); float h=P->GetVal(Val1,12,Val2);
   }}

Vector Took 18 clicks (0.018000 Seconds)

Which is roughly equivalent to...

MyTpe Array[128];
for(int i=0; i<1024; ++i)
{
   for(int p=0; p<128; ++p){
    float t=Array[p].GetVal(Val1, 12, Val2); float h=Array[p].GetVal(Val2,12,Val2);
    }}

Array Took 15 clicks (0.015000 seconds).

If you eliminate the call to avector.size(), the time becomes the same.

Finally, calling with [ ]

for(int i=0; i<1024; ++i){
   for(int i=0; i<avector.size(); ++i){
   float t=avector[i].GetVal(Val1,12,Val2); float h=avector[i].GetVal(Val1,12,Val2);
   }}

Vector Took 33 clicks (0.033000 seconds)

Timed with clock()

分分钟 2024-07-24 10:34:10

这取决于。

答案比现有答案显示的要微妙得多。

at 总是比迭代器或 operator[] 慢。
但对于 operator[] 与迭代器,它取决于:

  1. 您使用 operator[] 的准确程度

  2. 您的特定 CPU 是否具有索引寄存器(x86 上的ESI/EDI)。

  3. 有多少其他代码也使用传递给operator[]的相同索引。
    (例如,您是否以锁步方式对多个数组进行索引?)

原因如下:

  1. 如果您执行类似的操作

    std::vector<无符号字符>   甲、乙; 
      for (size_t i = 0; i < n; ++i) 
      { 
          a[13 * i] = b[37 * i]; 
      } 
      

    那么这段代码可能会比迭代器版本慢得多,因为它在循环的每次迭代中执行乘法操作

    同样,如果您执行以下操作:

    struct T { unsigned char a[37];   }; 
      std::vector   A; 
      for (size_t i = 0; i < n; ++i) 
      { 
          a[i] = foo(i); 
      } 
      

    那么这可能比迭代器版本慢,因为sizeof(T)不是2的幂,因此每次循环时,您(再次)乘以 37

  2. 如果您的 CPU 有索引寄存器,那么您的代码使用索引的性能比使用迭代器的性能好,甚至更好,如果使用索引寄存器可以释放另一个寄存器以供循环使用。 这不是仅仅通过观察就能看出的; 你必须分析代码和/或反汇编它。

  3. 如果多个数组可以共享相同的索引,则代码只需递增一个索引,而不是递增多个迭代器,这会减少对内存的写入,从而通常会提高性能。 但是,如果您只迭代单个数组,那么迭代器可能会更快,因为它避免了向现有基指针添加偏移量的需要。

一般来说,您应该更喜欢迭代器而不是索引,更喜欢索引而不是指针,直到并且除非您遇到瓶颈,分析显示切换将是有益的,因为迭代器是通用的 并且可能已经是最快的方法; 它们不要求数据是随机寻址的,这允许您在必要时交换容器。 索引是下一个首选工具,因为它们仍然不需要直接访问数据——它们失效的频率较低,例如您可以用 deque 替换 vector 没有任何问题。 指针应该是最后的手段,只有当迭代器在发布模式下尚未退化为指针时,它们才会被证明是有益的。

It depends.

The answer is much more subtle than the existing answers show.

at is always slower than iterators or operator[].
But for operator[] vs. iterators, it depends on:

  1. How exactly you're using operator[].

  2. Whether your particular CPU has index registers (ESI/EDI on x86).

  3. How much other code also uses the same index passed to operator[].
    (e.g., are you indexing through multiple arrays in lockstep?)

Here's why:

  1. If you do something like

    std::vector<unsigned char> a, b;
    for (size_t i = 0; i < n; ++i)
    {
        a[13 * i] = b[37 * i];
    }
    

    Then this code will likely be much slower than the iterator version, since it performs a multiplication operation at each iteration of the loop!

    Similarly, if you do something like:

    struct T { unsigned char a[37]; };
    std::vector<T> a;
    for (size_t i = 0; i < n; ++i)
    {
        a[i] = foo(i);
    }
    

    Then this will probably also be slower than the iterator version, because sizeof(T) is not a power of 2, and therefore you are (again) multiplying by 37 each time you loop!

  2. If your CPU has index registers, then your code can perform as well or even better with indices rather than with iterators, if using the index register frees up another register for use in the loop. This is not something you can tell just by looking; you'd have to profile the code and/or disassemble it.

  3. If multiple arrays can share the same index, then the code only has to increment one index instead of incrementing multiple iterators, which reduces writes to memory and thus generally increases performance. However, if you're only iterating over a single array, then an iterator may very well be faster, since it avoids the need to add an offset to an existing base pointer.

In general, you should prefer iterators to indices, and indices to pointers, until and unless you face a bottleneck that profiling shows it will be beneficial to switch, because iterators are general-purpose and already likely to be the fastest approach; they don't require the data to be randomly-addressable, which allows you to swap containers if necessary. Indices are the next preferred tool, as they still don't require direct access to the data -- they are invalidated less frequently, and you can e.g. substitute a deque for a vector without any problems. Pointers should be a last resort, and they will only prove beneficial if iterators aren't already degenerating to potiners in release mode.

月朦胧 2024-07-24 10:34:10

第一个在调试模式下会更快,因为索引访问会在幕后创建迭代器,但在发布模式下,所有内容都应该内联,差异应该可以忽略不计或为空

The first one will be faster in debug mode because index access creates iterators behind the scene, but in release mode where everything should be inlined, the difference should be negligible or null

伏妖词 2024-07-24 10:34:10

您可以使用此测试代码并比较结果!
迪奥它!

#include <vector> 
#include <iostream> 
#include <ctime> 
using namespace std;; 


struct AAA{
    int n;
    string str;
};
int main() { 
    const int BIG = 5000000; 
    vector <AAA> v; 
    for ( int i = 0; i < BIG; i++ ) { 
        AAA a = {i, "aaa"};
        v.push_back( a ); 
    } 

    clock_t now;
    cout << "start" << endl; 
    int n = 0; 
    now = clock(); 
    for(vector<AAA>::iterator it = v.begin(); it != v.end(); ++it) { 
        n += it->n; 
    } 
   cout << clock() - now << endl; 

    n = 0;
    now = clock(); 
    for(size_t i = 0; i < v.size(); ++i) { 
        n += v[i].n; 
    } 
    cout << clock() - now << endl; 

    getchar();
    return n != 0; 
} 

You can use this test code and compare results!
Dio it!

#include <vector> 
#include <iostream> 
#include <ctime> 
using namespace std;; 


struct AAA{
    int n;
    string str;
};
int main() { 
    const int BIG = 5000000; 
    vector <AAA> v; 
    for ( int i = 0; i < BIG; i++ ) { 
        AAA a = {i, "aaa"};
        v.push_back( a ); 
    } 

    clock_t now;
    cout << "start" << endl; 
    int n = 0; 
    now = clock(); 
    for(vector<AAA>::iterator it = v.begin(); it != v.end(); ++it) { 
        n += it->n; 
    } 
   cout << clock() - now << endl; 

    n = 0;
    now = clock(); 
    for(size_t i = 0; i < v.size(); ++i) { 
        n += v[i].n; 
    } 
    cout << clock() - now << endl; 

    getchar();
    return n != 0; 
} 
尬尬 2024-07-24 10:34:10

如果您使用的是 VisualStudio 2005 或 2008,为了从向量中获得最佳性能,您需要定义
_SECURE_SCL=0

默​​认情况下,_SECURE_SCL 处于开启状态,这使得迭代包含的速度明显变慢。 也就是说,在调试版本中保留它,这将使跟踪任何错误变得更加容易。 需要注意的是,由于宏会更改迭代器和容器的大小,因此共享 stl 容器的所有编译单元必须保持一致。

If you are using VisualStudio 2005 or 2008, to get the best performance out of the vector you'll need to define
_SECURE_SCL=0

By default _SECURE_SCL is on which makes iterating over a contain significantly slower. That said leave it on in debug builds, it will make tracking down any errors much easier. One word of caution, since the macro changes the size of iterators and containers, you'll have to be consistent across all compilation units that share a stl container.

蘑菇王子 2024-07-24 10:34:10

我认为唯一的答案可能是在您的平台上进行测试。 一般来说,STL 中唯一标准化的是集合提供的迭代器类型和算法的复杂性。

我想说,这两个版本之间没有(没有太大区别)——我能想到的唯一区别是,当代码必须计算数组的长度时,它必须迭代整个集合(我不确定长度是否存储在向量内的变量中,那么开销并不重要)

使用“at”访问元素应该比使用 [] 直接访问元素花费更长的时间,因为它会检查您是否在向量的边界内,如果超出边界则抛出异常(似乎 [] 通常只是使用指针算术 - 所以它应该更快)

I think the only answer could be a test on your platform. Generally the only thing which is standardized in the STL is the type of iterators a collection offers and the complexity of algorithms.

I would say that there is no (not much of a difference) between those two versions- the only difference I could think of would be tjat the code has to iterate through the whole collection when it has to compute the length of an array (I'm not sure if the length is stored in a variable inside the vector, then the overhead wouldn't matter)

Accessing the elements with "at" should take a little bit longer than directly accessing it with [] because it checks if you are in the bounds of the vector and throws an exception if you are out of bounds (it seems [] is normally just using pointer arithmetic - so it should be faster)

征棹 2024-07-24 10:34:10

我现在在尝试优化我的 OpenGL 代码时发现了这个线程,并且想分享我的结果,即使该线程很旧。

背景:我有 4 个向量,大小从 6 到 12 不等。写入仅在代码开头发生一次,并且每 0.1 毫秒对向量中的每个元素进行读取

以下是剥离的内容首先使用的代码的向下版本:

for(vector<T>::iterator it = someVector.begin(); it < someVector.end(); it++)
{
    T a = *it;

    // Various other operations
}

使用此方法的帧速率约为每秒 7 帧 (fps)。

然而,当我将代码更改为以下内容时,帧速率几乎翻倍至 15fps。

for(size_t index = 0; index < someVector.size(); ++index)
{
    T a = someVector[index];

    // Various other operations
}

I found this thread now when trying to optimize my OpenGL code and wanted to share my results even though the thread is old.

Background: I have 4 vectors, sizes ranging from 6 to 12. Write happens only once at the beginning of the code and read occurs for each of the elements in the vectors every 0.1 milliseconds

The following is the stripped down version of the code used first:

for(vector<T>::iterator it = someVector.begin(); it < someVector.end(); it++)
{
    T a = *it;

    // Various other operations
}

The frame rate using this method was about 7 frames per second (fps).

However, when I changed the code to the following, the frame rate almost doubled to 15fps.

for(size_t index = 0; index < someVector.size(); ++index)
{
    T a = someVector[index];

    // Various other operations
}
找个人就嫁了吧 2024-07-24 10:34:10

差异应该可以忽略不计。 std::vector 保证其元素在内存中连续排列。 因此,大多数 stl 实现将迭代器实现为 std::vector 作为普通指针。 考虑到这一点,两个版本之间的唯一区别应该是第一个版本递增指针,第二个版本递增索引,然后将索引添加到指针。 所以我的猜测是第二个可能是一个极快(就周期而言)的机器指令。

尝试检查编译器生成的机器代码。

然而,一般来说,如果确实重要的话,建议是进行分析。 过早地思考这类问题通常不会给你带来太多回报。 通常,您的代码的热点将位于您乍一看不会怀疑的其他地方。

The difference should be negligible. std::vector guarantees that its elements are laid out consecutively in memory. Therefore, most stl implementations implement iterators into std::vector as a plain pointer. With this is mind, the only difference between the two versions should be that the first one increments a pointer, and in the second increments an index which is then added to a pointer. So my guess would be the second one is maybe one extremly fast (in terms of cycles) machine instruction more.

Try and check the machine code your compiler produces.

In general, however, the advice would be to profile if it really matters. Thinking about this kind of question prematurely usually does not give you too much back. Usually, your code's hotspots will be elsewhere where you might not suspect it at first sight.

千紇 2024-07-24 10:34:10

这是我编写的代码,使用默认的 mingw 编译器在 Code::Blocks v12.11 中编译。
这会创建一个巨大的向量,然后使用迭代器、at() 和索引来访问每个元素。
每个循环通过函数调用最后一个元素一次,以及通过将最后一个元素保存到临时内存中一次。

计时是使用 GetTickCount 完成的。

#include <iostream>
#include <windows.h>
#include <vector>
using namespace std;

int main()
{
    cout << "~~ Vector access speed test ~~" << endl << endl;
    cout << "~ Initialization ~" << endl;
    long long t;
    int a;
    vector <int> test (0);
    for (int i = 0; i < 100000000; i++)
    {
        test.push_back(i);
    }
    cout << "~ Initialization complete ~" << endl << endl;


    cout << "     iterator test: ";
    t = GetTickCount();
    for (vector<int>::iterator it = test.begin(); it < test.end(); it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "Optimised iterator: ";
    t=GetTickCount();
    vector<int>::iterator endofv = test.end();
    for (vector<int>::iterator it = test.begin(); it < endofv; it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "                At: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "      Optimised at: ";
    t = GetTickCount();
    int endof = test.size();
    for (int i = 0; i < endof; i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "             Index: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;



    cout << "   Optimised Index: ";
    t = GetTickCount();
    int endofvec = test.size();
    for (int i = 0; i < endofvec; i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;

    cin.ignore();
}

基于此,我个人认为“优化”版本比“非优化”版本更快,迭代器比 vector.at() 慢,而 vector.at() 比直接索引慢。

我建议您自己编译并运行代码。

编辑:这段代码是我在 C/C++ 经验较少时编写的。 进一步的测试用例应该是使用前缀增量运算符而不是后缀。 这应该会改善运行时间。

Here's a code I wrote, compiled in Code::Blocks v12.11, using the default mingw compiler.
This creates a huge vector, then accesses each element by using iterators, at(), and index.
Each is looped once by calling the last element by function, and once by saving the last element to temporary memory.

Timing is done using GetTickCount.

#include <iostream>
#include <windows.h>
#include <vector>
using namespace std;

int main()
{
    cout << "~~ Vector access speed test ~~" << endl << endl;
    cout << "~ Initialization ~" << endl;
    long long t;
    int a;
    vector <int> test (0);
    for (int i = 0; i < 100000000; i++)
    {
        test.push_back(i);
    }
    cout << "~ Initialization complete ~" << endl << endl;


    cout << "     iterator test: ";
    t = GetTickCount();
    for (vector<int>::iterator it = test.begin(); it < test.end(); it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "Optimised iterator: ";
    t=GetTickCount();
    vector<int>::iterator endofv = test.end();
    for (vector<int>::iterator it = test.begin(); it < endofv; it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "                At: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "      Optimised at: ";
    t = GetTickCount();
    int endof = test.size();
    for (int i = 0; i < endof; i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "             Index: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;



    cout << "   Optimised Index: ";
    t = GetTickCount();
    int endofvec = test.size();
    for (int i = 0; i < endofvec; i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;

    cin.ignore();
}

Based on this, I personally got that "optimised" versions are faster than "non-optimised" Iterators are slower than vector.at() which is slower than direct indices.

I suggest you compile and run the code for yourselves.

EDIT: This code was written back when I had less experience with C/C++. A further test case should be to use prefix increment operators instead of postfix. That should better the running time.

素年丶 2024-07-24 10:34:10

与原始问题略有不同,但最快的循环是

for( size_t i=size() ; i-- ; ) { ... }

这当然会倒计时。 如果循环中有大量迭代,但它只包含少量非常快的操作,这确实可以节省大量成本。

因此,通过 [] 运算符访问,这可能比已经发布的许多示例更快。

Only slightly tangential to the original question, but the fastest loop would be

for( size_t i=size() ; i-- ; ) { ... }

which would of course count down. This does give a substantial saving if you have a large number of iterations in your loop, but it contains only a small number of very fast operations.

So with the [] operator access, this might be faster than many of the examples already posted.

审判长 2024-07-24 10:34:10

我不想关注特定的生成代码,而是想概述索引方法之间更抽象的区别,以解释所涉及的实际复杂性:

at(): 9158ms
operator[]: 4269ms
iterator: 3914ms

数组本质上是内存中的起源。 对数组进行索引涉及将 n 个元素的大小添加到原点指针。 在这种情况下,我们使用逻辑索引,这意味着每个索引实际上是由编译器根据数据类型的大小进行缩放的。

因此,要获取数组的元素,机器的方法是origin + (index * sizeOfElement)

现在大多数数组都是以 3 个指针的形式实现的,即开始、结束和当前位置,因此一些操作(例如大小或容量)实际上涉及额外的操作。 其中一些操作甚至可能涉及除法,这在某些平台上可能相当慢。 如果元素的大小恰好是两个值的幂,这可能不会那么糟糕,因为在这种情况下,编译器将潜在昂贵的 mul 和 div 操作优化为高效的位移操作。

鉴于此,就可以理解为什么 at() 是最糟糕的选择,因为它确实涉及大量额外操作以确保每个索引的边界都是正确的。 在其他情况下,只要实际适用,您就会将边界保持委托给现有的、更粗糙的、因此更便宜的方法。

[] 明显更好,我们不再担心边界检查及其计算开销,我们只是盲目地使用 origin + (index * sizeOfElement),信任一些既定的约束使其保持在界限之内。

最后,迭代器是性能最佳的选项,原因很简单,我们现在可以避免索引缩放。 因此,我们将 origin + (index * sizeOfElement) 减少为 current + sizeOfElement,我们避免了索引缩放并将开销减少到单次添加,这在 99.9 中% 的情况是单个 cpu 时钟开销,并且循环可以限制为单个整数比较并忽略项目大小和索引。 1 个动态指针、1 个静态大小、1 个时钟周期增量和 1 个时钟周期循环约束,它不再需要运行循环、最小的指令和数据占用空间。

迭代器的一个缺点是您没有出于其他原因可能需要的辅助索引,并且从迭代器派生它会产生一定的成本,但将迭代器作为默认循环仍然是值得的,作为额外的索引可以轻松地根据上下文添加和更新,并且只需在实际需要时付费,并且它与 [] 索引的成本相同,只是可选的而不是默认的。

在实践中,编译器可能会在不同程度上理解您的意图并应用大量的优化。 例如,它可能能够检测到您的循环受到数组实际大小的约束,并且数组长度保持不变,因此它可能完全省略边界检查。 如果你的循环大小是一个常数,它可能会被推出,甚至根据数据类型和操作等自动矢量化......

Instead of focusing on particular generated code, I'd like to outline the more abstract difference between the indexing methods in order to explain the actual complexity involved:

at(): 9158ms
operator[]: 4269ms
iterator: 3914ms

An array is essentially an origin in memory. Indexing an array involves adding the size of n elements to the origin pointer. We work with logical indices in this context, meaning that each index is actually scaled by the size of the data type by the compiler for us.

So, to get the an element of an array, the machine's way to do that would be origin + (index * sizeOfElement).

Most arrays nowadays are implemented as 3 pointers, start, finish and current position, so some operations like size or capacity actually involve additional operations. Some of those operations may even involve division, which may be quite slow on some platforms. This may not be as bad if the size of the element happens to be a power of two value, because in this case the compiler optimizes potentially expensive mul and div operations to efficient bit shift operations.

In light of that, it becomes understandable why at() would be the worst possible option, as it does involve a significant amount of additional operations to ensure the bounds are correct on every indexing. In the other instances, you are delegating the bounds keeping, whenever practically applicable, to an existing and coarser and thus cheaper method.

[] is significantly better, we are no longer bothering with the bounds check and its computational overhead, we are only origin + (index * sizeOfElement) blindly, trusting some established constraint to keep it within bounds.

And finally, the iterator is the best performing option for the simple reason we can now avoid the index scaling. So we are reducing the origin + (index * sizeOfElement) to a current + sizeOfElement, we are avoiding the index scaling and reducing the overhead to a single addition, which in 99.9% of the cases is a single cpu clock overhead, and the loop can be constrained to a single integer comparison and disregard item sizes and indices. 1 dynamic pointer, 1 static size, 1 clock cycle increment and 1 clock cycle loop constraint, it doesn't get any more barebone to run a loop, minimal instruction and data footprint.

The one downside of iterators is you don't have the auxiliary index you may need for some other reason, and there's a certain cost of deriving it from the iterator, but it still pays to have the iterator be your default loop, as an additional index can easily be added and updated contextually and only pay for it when you actually need it, and it is the same cost as [] indexing, only optional rather than by default.

In practice, the compiler may understand your intent to a varying degree and apply a considerable amount of optimizations. For example, the it may be able to detect that your loop is constrained by the actual size of the array, and that the array length remains constant, so it may omit the bounds checking altogether. If your loop size is a constant, it may get rolled out or even auto vectorized depending on the data types and operations and so on...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文