迭代一次和迭代两次的性能差异？

发布于 2024-09-29 12:53:54 字数 399 浏览 4 评论 0原文

考虑诸如...

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
        test[i].bar();
}

现在考虑..

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
}
for (int i = 0; i < test.size(); ++i) {
        test[i].bar();
}

这两者之间花费的时间是否有很大差异？即实际迭代的成本是多少？似乎您重复的唯一实际操作是增量和比较（尽管我认为这对于非常大的 n 来说会变得很重要）。我错过了什么吗？

原文

Consider something like...

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
        test[i].bar();
}

Now consider..

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
}
for (int i = 0; i < test.size(); ++i) {
        test[i].bar();
}

Is there a large difference in time spent between these two? I.e. what is the cost of the actual iteration? It seems like the only real operations you are repeating are an increment and a comparison (though I suppose this would become significant for a very large n). Am I missing something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

定格我的天空 2024-10-06 12:53:54

首先，如上所述，如果您的编译器无法优化 size() 方法，因此它仅被调用一次，或者只不过是一次读取（没有函数调用开销），那么它将伤害。

不过，您可能还需要关心第二个影响。如果您的容器大小足够大，那么第一种情况执行速度会更快。这是因为，当它到达 test[i].bar() 时，test[i] 将被缓存。第二种情况，使用分割循环，将会破坏缓存，因为每个函数总是需要从主内存重新加载 test[i] 。

更糟糕的是，如果您的容器（我猜是 std::vector ）有太多项目，以至于无法全部放入内存，并且其中一些必须位于磁盘上的交换中，那么差异将会很大，因为您必须从磁盘加载内容两次。

但是，您必须考虑最后一件事：只有在函数调用之间（实际上，容器中的不同对象之间）不存在顺序依赖性时，所有这一切才会产生影响。因为，如果你计算出来，第一种情况会：

test[0].foo();
test[0].bar();
test[1].foo();
test[1].bar();
test[2].foo();
test[2].bar();
// ...
test[test.size()-1].foo();
test[test.size()-1].bar();

而第二种情况会：

test[0].foo();
test[1].foo();
test[2].foo();
// ...
test[test.size()-1].foo();
test[0].bar();
test[1].bar();
test[2].bar();
// ...
test[test.size()-1].bar();

因此，如果您的 bar() 假定所有 foo() 都已运行，那么您如果将第二种情况更改为第一种情况，则会破坏它。同样，如果 bar() 假设 foo() 尚未在后面的对象上运行，那么从第二种情况转移到第一种情况将会破坏您的代码。

所以要小心并记录你所做的事情。

First, as noted above, if your compiler can't optimize the size() method out so it's just called once, or is nothing more than a single read (no function call overhead), then it will hurt.

There is a second effect you may want to be concerned with, though. If your container size is large enough, then the first case will perform faster. This is because, when it gets to test[i].bar(), test[i] will be cached. The second case, with split loops, will thrash the cache, since test[i] will always need to be reloaded from main memory for each function.

Worse, if your container (std::vector, I'm guessing) has so many items that it won't all fit in memory, and some of it has to live in swap on your disk, then the difference will be huge as you have to load things in from disk twice.

However, there is one final thing that you have to consider: all this only makes a difference if there is no order dependency between the function calls (really, between different objects in the container). Because, if you work it out, the first case does:

test[0].foo();
test[0].bar();
test[1].foo();
test[1].bar();
test[2].foo();
test[2].bar();
// ...
test[test.size()-1].foo();
test[test.size()-1].bar();

while the second does:

test[0].foo();
test[1].foo();
test[2].foo();
// ...
test[test.size()-1].foo();
test[0].bar();
test[1].bar();
test[2].bar();
// ...
test[test.size()-1].bar();

So if your bar() assumes that all foo()'s have run, you will break it if you change the second case to the first. Likewise, if bar() assumes that foo() has not been run on later objects, then moving from the second case to the first will break your code.

So be careful and document what you do.

回复收藏 0 原文

甜是你 2024-10-06 12:53:54

这样的比较有很多方面。

首先，两个选项的复杂度都是O(n)，所以有区别反正也不是很大。我的意思是，如果您编写具有大量 n 和“繁重”操作 .foo() 和 bar( 的相当大且复杂的程序，您一定不必关心它）。因此，只有在非常小的简单程序（例如，这是一种用于嵌入式设备的程序）的情况下，您才必须关心它。

其次，这取决于编程语言和编译器。例如，我确信大多数 C++ 编译器都会优化您的第二个选项，以生成与第一个选项相同的代码。

第三，如果编译器没有优化您的代码，性能差异将在很大程度上取决于目标处理器。考虑汇编命令术语中的循环 - 它看起来像这样（伪汇编语言）：

LABEL L1:
          do this    ;; some commands
          call that
          IF condition
          goto L1
          ;; some more instructions, ELSE part

即每个循环段落只是 IF 语句。但现代处理器不喜欢IF。这是因为处理器可能会重新排列指令以预先执行它们或只是为了避免空闲。使用IF（事实上，条件跳转或跳转）指令，处理器不知道它们是否可以重新安排操作。
还有一种称为分支预测器的机制。来自维基百科的材料：

分支预测器是一个数字电路，试图猜测分支的方向分支（例如 if-then-else 结构）将在确定之前进行。

如果预测器的猜测是错误的，则 IF 的这种“软化”效果，不会将进行优化。

因此，您可以看到这两个选项都有大量条件：目标语言和编译器、目标机器、处理器和分支预测器。这一切使得系统变得非常复杂，你无法预见会得到什么确切的结果。我相信，如果您不处理嵌入式系统或类似的东西，最好的解决方案就是使用您更喜欢的形式。

There are many aspects in such comparison.

First, complexity for both options is O(n), so difference isn't very big anyway. I mean, you must not care about it if you write quite big and complex program with a large n and "heavy" operations .foo() and bar(). So, you must care about it only in case of very small simple programs (this is kind of programs for embedded devices, for example).

Second, it will depend on programming language and compiler. I'm assured that, for instance, most of C++ compilers will optimize your second option to produce same code as for the first one.

Third, if compiler haven't optimized your code, performance difference will heavily depend on the target processor. Consider loop in a term of assembly commands - it will look something like this (pseudo assembly language):

LABEL L1:
          do this    ;; some commands
          call that
          IF condition
          goto L1
          ;; some more instructions, ELSE part

I.e. every loop passage is just IF statement. But modern processors don't like IF. This is because processors may rearrange instructions to execute them beforehand or just to avoid idles. With the IF (in fact, conditional goto or jump) instructions, processors do not know if they may rearrange operation or not.
There's also a mechanism called branch predictor. From material of Wikipedia:

branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure.

This "soften" effect of IF's, through if the predictor's guess is wrong, no optimization will be performed.

So, you can see that there's a big amount of conditions for both your options: target language and compiler, target machine, it's processor and branch predictor. This all makes very complex system, and you cannot foresee what exact result you will get. I believe, that if you don't deal with embedded systems or something like that, the best solution is just to use the form which your are more comfortable with.

回复收藏 0 原文