在“i <”中矢量.size()”循环条件,每次迭代都会调用 size() 吗?

发布于 2024-09-27 00:48:31 字数 200 浏览 12 评论 0 原文

在以下代码中:

std::vector<int> var;
for (int i = 0; i < var.size(); i++);

size() 成员函数是为每次循环迭代调用,还是只调用一次?

In the following code:

std::vector<int> var;
for (int i = 0; i < var.size(); i++);

Is the size() member function called for each loop iteration, or only once?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

暮倦 2024-10-04 00:48:31

理论上,它每次都会被调用,因为 for 循环:

for(initialization; condition; increment)
    body;

被扩展为类似的东西

{
    initialization;
    while(condition)
    {
        body;
        increment;
    }
}

(注意大括号,因为初始化已经在内部作用域中)

实际上 ,如果编译器知道您的条件的一部分在整个循环期间保持不变并且它没有副作用,它可以足够聪明地将其移出。这通常是在未写入其参数的循环中使用 strlen 和类似的东西(编译器熟知的)来完成的。

然而必须注意的是,最后一个条件的证明并不总是微不足道的。一般来说,如果容器是函数的本地容器并且从不传递给外部函数,那就很容易;如果容器不是本地的(例如,它是通过引用传递的 - 即使它是 const)并且循环体包含对其他函数的调用,则编译器通常必须假设此类函数可能会更改它,从而阻塞吊装长度的计算。

如果您知道条件的一部分评估起来“昂贵”,那么手动进行优化是值得的(而这样的条件通常不是,因为它通常归结为指针减法,这几乎肯定是内联的)。


编辑:正如其他人所说,一般来说,对于容器来说,最好使用迭代器,但对于 vector 来说,它并不那么重要,因为通过 operator[ 随机访问元素] 保证为 O(1);实际上,对于向量,它通常是指针总和(向量基址+索引)和解引用与指针增量(前一个元素+1)和迭代器的解引用。由于目标地址仍然相同,我认为您无法从缓存局部性方面从迭代器中获得一些东西(即使是这样,如果您没有在紧密循环中遍历大数组,您甚至不应该注意到这样的情况)种改进)。

相反,对于列表和其他容器,使用迭代器而不是随机访问可能非常重要,因为使用随机访问可能意味着每次访问列表时都会遍历,而递增迭代器只是指针取消引用。

In theory, it is called each time, since a for loop:

for(initialization; condition; increment)
    body;

is expanded to something like

{
    initialization;
    while(condition)
    {
        body;
        increment;
    }
}

(notice the curly braces, because initialization is already in an inner scope)

In practice, if the compiler understands that a piece of your condition is invariant through all the duration of the loop and it does not have side-effects, it can be smart enough to move it out. This is routinely done with strlen and things like that (that the compiler knows well) in loops where its argument isn't written.

However it must be noted that this last condition isn't always trivial to prove; in general, it's easy if the container is local to the function and is never passed to external functions; if the container is not local (e.g. it's passed by reference - even if it's const) and the loop body contains calls to other functions, the compiler often has to assume that such functions may alter it, thus blocking the hoisting of the length calculation.

Doing that optimization by hand is worthy if you know that a part of your condition is "expensive" to evaluate (and such condition usually isn't, since it usually boils down to a pointer subtraction, which is almost surely inlined).


Edit: as others said, in general with containers it's better to use iterators, but for vectors it's not so important, because random access to elements via operator[] is guaranteed to be O(1); actually with vectors it usually is a pointer sum (vector base+index) and dereference vs the pointer increment (preceding element+1) and dereference of iterators. Since the target address is still the same, I don't think that you can gain something from iterators in terms of cache locality (and even if so, if you're not walking big arrays in tight loops you shouldn't even notice such kind of improvements).

For lists and other containers, instead, using iterators instead of random access can be really important, since using random access may mean walk every time the list, while incrementing an iterator is just a pointer dereference.

翻身的咸鱼 2024-10-04 00:48:31

每次都会调用 size() 成员函数,但这将是一个非常糟糕的实现,不会内联它,并且一个奇怪的实现,它不是对固定数据的简单访问或两个指针相减。
不管怎样,在你分析你的应用程序并发现这是一个瓶颈之前,你不应该担心这些琐事。

但是,您应该注意的是:

  1. 向量索引的正确类型是std::vector::size_type
  2. 在某些类型(例如某些迭代器)中,i++可能++i慢。

因此,循环应该是:

for(vector<int>::size_type i=0; i<var.size(); ++i)
  ...

The size() member function is called each time, but it would be a really bad implementation that wouldn't inline it, and a strange one where it wouldn't be a simple access of a fixed datum or a subtraction of two pointers.
Anyway, you shouldn't worry yourself with such trivialities until you have profiled your application and found out that this is a bottleneck.

However, what you should pay attention to is:

  1. The correct type for a vector's index is std::vector<T>::size_type.
  2. There are types (some iterators, for example) where i++ might be slower than ++i.

Therefore, the loop should be:

for(vector<int>::size_type i=0; i<var.size(); ++i)
  ...
倾其所爱 2024-10-04 00:48:31

它每次都会被“调用”,但我将调用放入引号中,因为它实际上可能只是一个内联方法调用,因此您不必担心它的性能。

为什么不使用 vector::iterator 来代替呢?

It's 'called' each time, but I put called into quotes because it really probably is just an inline method call, so you don't have to worry about its performance.

Why not use vector<int>::iterator instead?

活雷疯 2024-10-04 00:48:31

你的问题的问题在于它没有任何意义。 C++ 编译器将一些源代码翻译成二进制程序。要求是生成的程序必须根据 C++ 标准的规则保留代码的可观察效果。这段代码:

for (int i = 0; i < var.size(); i++); 

根本没有任何可观察到的效果。此外,它不会以任何方式与周围的代码交互,并且编译器可能会完全优化它;即没有生成对应的程序集。

为了使您的问题有意义,您需要指定循环内发生的情况。问题在于

for (int i = 0; i < var.size(); i++) { ... }

答案很大程度上取决于 ... 实际是什么。我相信@MatteoItalia 提供了一个非常好的答案,只需添加我所做的一些实验的描述。考虑以下代码:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

首先,即使调用 var.size() 几乎 100% 肯定会内联启用的优化,并且此内联通常会转换为两个指针的减法,但这仍然会带来循环一些开销。如果编译器无法证明向量大小被保留(这通常是非常困难的,甚至是不可行的,例如在我们的例子中),那么您最终将得到不必要的加载和< em>sub(可能还有shift)指令。使用 GCC 9.2、-O3 和 x64 生成的循环汇编为:

.L3:
    mov     rsi, rbx
    mov     rdi, rbp
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    mov     rax, QWORD PTR [rbp+8] // loads a pointer
    sub     rax, QWORD PTR [rbp+0] // subtracts another poniter
    sar     rax, 2                 // result * sizeof(int) => size()
    cmp     rbx, rax
    jb      .L3

如果我们按如下方式重写代码:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0, e = v.size(); i < e; i++)
      res += g(v, i);
   return res;
}

那么,生成的汇编更简单(因此更快):

.L3:
    mov     rsi, rbx
    mov     rdi, r13
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    cmp     rbx, rbp
    jne     .L3

的值向量的大小简单地保存在寄存器(rbp)中。

我什至尝试了一个不同的版本,其中向量被标记为 const:

int g(const std::vector<int>&, size_t);

int f(const std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

令人惊讶的是,即使当 v.size() 无法在此处更改时,生成的程序集与第一种情况(带有附加的 movsubsar 指令)。

现场演示位于此处

此外,当我将循环更改为:

for (size_t i = 0; i < v.size(); i++)
   res += v[i];

then 时,在汇编级别的循环内没有对 v.size() 进行评估(指针相减)。 GCC 能够在这里“看到”循环体不会以任何方式改变大小。

The problem with your question is that it does not make any sense. A C++ compiler translates some source code into a binary program. The requirement is that the resulting program must preserve observable effects of the code according to the rules of the C++ Standard. This code:

for (int i = 0; i < var.size(); i++); 

simply does not have any observable effect. Moreover, it does not interact with the surrounding code any way, and the compiler may optimize it completely away; that is to generate no corresponding assembly.

To make your question meaningful, you need to specify what happens inside the loop. The problem with

for (int i = 0; i < var.size(); i++) { ... }

is that the answer very much depends on what ... actually is. I believe @MatteoItalia provided a very nice answer, just would add a description of some experiments I made. Consider the following code:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

First, even if calling var.size() will almost 100% sure be inlined with enabled optimizations, and this inlining typically translates into a subtraction of two pointers, this still brings into the loop some overhead. If a compiler is not able to prove that the vector size is preserved (which, generally, is very difficult or even infeasible, such as in our case), then you will end up with unnecessary load and sub (and, possibly, shift) instructions. The generated assembly of the loop with GCC 9.2, -O3, and x64 was:

.L3:
    mov     rsi, rbx
    mov     rdi, rbp
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    mov     rax, QWORD PTR [rbp+8] // loads a pointer
    sub     rax, QWORD PTR [rbp+0] // subtracts another poniter
    sar     rax, 2                 // result * sizeof(int) => size()
    cmp     rbx, rax
    jb      .L3

If we rewrite the code as follows:

int g(std::vector<int>&, size_t);

int f(std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0, e = v.size(); i < e; i++)
      res += g(v, i);
   return res;
}

then, the generated assembly is simpler (and, therefore, faster):

.L3:
    mov     rsi, rbx
    mov     rdi, r13
    add     rbx, 1
    call    g(std::vector<int, std::allocator<int> >&, unsigned long)
    add     r12d, eax
    cmp     rbx, rbp
    jne     .L3

The value of the vector's size is simply kept in a register (rbp).

I even tried a different version where the vector is marked as being const:

int g(const std::vector<int>&, size_t);

int f(const std::vector<int>& v) {
   int res = 0;
   for (size_t i = 0; i < v.size(); i++)
      res += g(v, i);
   return res;
}

Surprisingly, even when v.size() cannot change here, the generated assembly was the same as in the first case (with additional mov, sub, and sar instructions).

Live demo is here.

Additionally, when I changed the loop into:

for (size_t i = 0; i < v.size(); i++)
   res += v[i];

then, there was no evaluation of v.size() (subtraction of pointers) within the loop on an assembly level. GCC was able to "see" here, that the body of the loop does not alter the size any way.

酷到爆炸 2024-10-04 00:48:31

每次都必须调用它,因为 size() 可能每次都会返回不同的值。

因此,没有什么大的选择,它只是必须的。

It must be called everytime because size() might return a different value everytime.

Therefore there's no big choice it simply must be.

羞稚 2024-10-04 00:48:31

正如其他人所说,

  • 语义必须就像每次
  • 可能内联时都调用它一样,并且可能是一个简单的函数,

在其之上,

  • 足够聪明的优化器可能能够推断出它是一个没有副作用的循环不变式,并且完全删除它(如果代码是内联的,这会更容易,但即使不是内联,如果编译器进行全局优化,这也是可能的)

As other have said

  • the semantics must be as if it were called each time
  • it is probably inlined, and is probably a simple function

on top of which

  • a smart enough optimizer may be able to deduce that it is a loop invariant with no side effects and elide it entirely (this is easier if the code is inlined, but may be possible even if it is not if the compiler does global optimization)
心碎的声音 2024-10-04 00:48:31

但它可以通过这种方式完成(前提是该循环仅打算读/写而不实际更改向量的大小):

for(vector<int>::size_type i=0, size = var.size(); i < size; ++i) 
{
//do something
}

在上面的循环中,您只需一次对 size 的调用,与内联或不内联的大小无关。

But it could be done in this way (providing that this loop intends to only read/write without actually changing the size of a vector):

for(vector<int>::size_type i=0, size = var.size(); i < size; ++i) 
{
//do something
}

In the loop above you have just one call to size independently from size being inlined or not.

谎言月老 2024-10-04 00:48:31

正如其他人所说,编译器应决定如何处理编写的实际代码。关键是每次都会被调用。但如果您想提高性能,最好在编写代码时考虑一些因素。你的情况就是其中之一,还有其他的情况,就像这两段代码之间的区别:

for (int i = 0 ; i < n ; ++i)
{
   for ( int j = 0 ; j < n ; ++j)
       printf("%d ", arr[i][j]);
   printf("\n");
}
for (int j = 0 ; j < n ; ++j)
{
   for ( int i = 0 ; i < n ; ++i)
       printf("%d ", arr[i][j]);
   printf("\n");
}

区别在于,第一个不会对每个引用改变太多的ram页面,但另一个会耗尽你的缓存和TLB和其他东西。

而且内联也没有多大帮助!因为调用函数的顺序将保持为 n(向量的大小) 次。虽然它在某些地方有帮助,但最好的办法是重写你的代码。

但!如果你想让编译器对你的代码进行优化,则永远不要放置 volatile,如下所示:

for(volatile int i = 0 ; i < 100; ++i)

它会阻止编译器优化。
如果您需要另一个性能提示,请使用寄存器而不是易失性。

for(register int i = 0 ; i < 100; ++i)

编译器将尝试不将 i 从 CPU 寄存器移动到 RAM。不能保证它能做到,但它会做到最好;)

as others said, The compiler shall decide what to do with the actual code written. The key figure is that it is called each time. But if you want to get a performance boost, it is best to write your code with some considerations. Your case is one of them, there are others as well, like the difference between these two pieces of code:

for (int i = 0 ; i < n ; ++i)
{
   for ( int j = 0 ; j < n ; ++j)
       printf("%d ", arr[i][j]);
   printf("\n");
}
for (int j = 0 ; j < n ; ++j)
{
   for ( int i = 0 ; i < n ; ++i)
       printf("%d ", arr[i][j]);
   printf("\n");
}

The difference is that the first one will not change the ram page too much per references, but the other will exhaust your cache and TLB and other stuff.

Also inline won't help that much! because the order of the calling function will remain as n(size of the vector) times. It helps in some places though, but the best thing is to rewrite your code.

But! if you want to let a compiler do it's optimizations over your code NEVER put volatile, like so:

for(volatile int i = 0 ; i < 100; ++i)

It prevents the compiler from optimizing.
If you need another hint for performance use register instead of volatile.

for(register int i = 0 ; i < 100; ++i)

The compiler will try not to move i from the CPU-registers to RAM. It is not ensured that it can do it, but it will do it's best ;)

凡尘雨 2024-10-04 00:48:31

认为如果编译器可以最终推断出变量var在“循环体”内没有被修改

for(int i=0; i< var.size();i++) { 
    // loop body
}

,那么上面的内容可能会被转置为等价的

const size_t var_size = var.size();
for( int i = 0; i < var_size; i++ ) { 
    // loop body
}

但是,我是不太确定,所以欢迎评论:)

另外,

  • 在大多数情况下,size() 成员函数是内联的,因此这个问题不必担心

  • 这个问题可能同样适用于始终用于迭代器的 end()基于循环,即 it != container.end()

  • 请考虑使用 size_tvector::size_type 作为<的类型code>i [请参阅下面 Steve Jessop 的评论。]

I think that if the compiler can conclusively deduce that the variable var is not modified inside the "loop body"

for(int i=0; i< var.size();i++) { 
    // loop body
}

then the above may be transposed to something equivalent of

const size_t var_size = var.size();
for( int i = 0; i < var_size; i++ ) { 
    // loop body
}

However, I am not absolutely sure, so comments are welcome :)

Also,

  • In most situations, the size() member function is inlined, so the issue does not warrant worrying

  • The concern is perhaps equally applicable to the end() which is always used for iterator based looping, i.e. it != container.end()

  • Please consider using size_t or vector<int>::size_type for the type of i [See Steve Jessop's comment below.]

倾听心声的旋律 2024-10-04 00:48:31

测试了 900k 次迭代
预先计算大小需要 43 秒,使用 size() 调用需要 42 秒。

如果保证向量大小在循环中不会改变,最好使用预先计算的大小,否则别无选择,必须使用 size()。

#include <iostream>
#include <vector>

using namespace std;

int main() {
vector<int> v;

for (int i = 0; i < 30000; i++)
        v.push_back(i);

const size_t v_size = v.size();
for(int i = 0; i < v_size; i++)
        for(int j = 0; j < v_size; j++)
                cout << "";

//for(int i = 0; i < v.size(); i++)
//      for(int j = 0; j < v.size(); j++)
//              cout << "";
}

Tested it for 900k iterations
Its taking time 43 seconds for pre-calculated size and 42 seconds for using the size() call.

If you guaranteed vector size doesn't change in the loop, better to use pre-calculated size otherwise there is no choice and must use size().

#include <iostream>
#include <vector>

using namespace std;

int main() {
vector<int> v;

for (int i = 0; i < 30000; i++)
        v.push_back(i);

const size_t v_size = v.size();
for(int i = 0; i < v_size; i++)
        for(int j = 0; j < v_size; j++)
                cout << "";

//for(int i = 0; i < v.size(); i++)
//      for(int j = 0; j < v.size(); j++)
//              cout << "";
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文