迭代一次和迭代两次的性能差异?

发布于 2024-09-29 12:53:54 字数 399 浏览 4 评论 0原文

考虑诸如...

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
        test[i].bar();
}

现在考虑..

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
}
for (int i = 0; i < test.size(); ++i) {
        test[i].bar();
}

这两者之间花费的时间是否有很大差异?即实际迭代的成本是多少?似乎您重复的唯一实际操作是增量和比较(尽管我认为这对于非常大的 n 来说会变得很重要)。我错过了什么吗?

Consider something like...

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
        test[i].bar();
}

Now consider..

for (int i = 0; i < test.size(); ++i) {
        test[i].foo();
}
for (int i = 0; i < test.size(); ++i) {
        test[i].bar();
}

Is there a large difference in time spent between these two? I.e. what is the cost of the actual iteration? It seems like the only real operations you are repeating are an increment and a comparison (though I suppose this would become significant for a very large n). Am I missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

定格我的天空 2024-10-06 12:53:54

首先,如上所述,如果您的编译器无法优化 size() 方法,因此它仅被调用一次,或者只不过是一次读取(没有函数调用开销),那么它将伤害。

不过,您可能还需要关心第二个影响。如果您的容器大小足够大,那么第一种情况执行速度会更快。这是因为,当它到达 test[i].bar() 时,test[i] 将被缓存。第二种情况,使用分割循环,将会破坏缓存,因为每个函数总是需要从主内存重新加载 test[i]

更糟糕的是,如果您的容器(我猜是 std::vector )有太多项目,以至于无法全部放入内存,并且其中一些必须位于磁盘上的交换中,那么差异将会很大,因为您必须从磁盘加载内容两次

但是,您必须考虑最后一件事:只有在函数调用之间(实际上,容器中的不同对象之间)不存在顺序依赖性时,所有这一切才会产生影响。因为,如果你计算出来,第一种情况会:

test[0].foo();
test[0].bar();
test[1].foo();
test[1].bar();
test[2].foo();
test[2].bar();
// ...
test[test.size()-1].foo();
test[test.size()-1].bar();

而第二种情况会:

test[0].foo();
test[1].foo();
test[2].foo();
// ...
test[test.size()-1].foo();
test[0].bar();
test[1].bar();
test[2].bar();
// ...
test[test.size()-1].bar();

因此,如果您的 bar() 假定所有 foo() 都已运行,那么您如果将第二种情况更改为第一种情况,则会破坏它。同样,如果 bar() 假设 foo() 尚未在后面的对象上运行,那么从第二种情况转移到第一种情况将会破坏您的代码。

所以要小心并记录你所做的事情。

First, as noted above, if your compiler can't optimize the size() method out so it's just called once, or is nothing more than a single read (no function call overhead), then it will hurt.

There is a second effect you may want to be concerned with, though. If your container size is large enough, then the first case will perform faster. This is because, when it gets to test[i].bar(), test[i] will be cached. The second case, with split loops, will thrash the cache, since test[i] will always need to be reloaded from main memory for each function.

Worse, if your container (std::vector, I'm guessing) has so many items that it won't all fit in memory, and some of it has to live in swap on your disk, then the difference will be huge as you have to load things in from disk twice.

However, there is one final thing that you have to consider: all this only makes a difference if there is no order dependency between the function calls (really, between different objects in the container). Because, if you work it out, the first case does:

test[0].foo();
test[0].bar();
test[1].foo();
test[1].bar();
test[2].foo();
test[2].bar();
// ...
test[test.size()-1].foo();
test[test.size()-1].bar();

while the second does:

test[0].foo();
test[1].foo();
test[2].foo();
// ...
test[test.size()-1].foo();
test[0].bar();
test[1].bar();
test[2].bar();
// ...
test[test.size()-1].bar();

So if your bar() assumes that all foo()'s have run, you will break it if you change the second case to the first. Likewise, if bar() assumes that foo() has not been run on later objects, then moving from the second case to the first will break your code.

So be careful and document what you do.

甜是你 2024-10-06 12:53:54

这样的比较有很多方面。

首先,两个选项的复杂度都是O(n),所以有区别反正也不是很大。我的意思是,如果您编写具有大量 n 和“繁重”操作 .foo()bar( 的相当大且复杂的程序,您一定不必关心它)。因此,只有在非常小的简单程序(例如,这是一种用于嵌入式设备的程序)的情况下,您才必须关心它。

其次,这取决于编程语言和编译器。例如,我确信大多数 C++ 编译器都会优化您的第二个选项,以生成与第一个选项相同的代码。

第三,如果编译器没有优化您的代码,性能差异将在很大程度上取决于目标处理器。考虑汇编命令术语中的循环 - 它看起来像这样(伪汇编语言):

LABEL L1:
          do this    ;; some commands
          call that
          IF condition
          goto L1
          ;; some more instructions, ELSE part

即每个循环段落只是 IF 语句。但现代处理器不喜欢IF。这是因为处理器可能会重新排列指令以预先执行它们或只是为了避免空闲。使用IF(事实上,条件跳转或跳转)指令,处理器不知道它们是否可以重新安排操作。
还有一种称为分支预测器的机制。 来自维基百科的材料

分支预测器是一个数字电路,试图猜测分支的方向分支(例如 if-then-else 结构)将在确定之前进行。

如果预测器的猜测是错误的,则 IF 的这种“软化”效果,不会将进行优化。

因此,您可以看到这两个选项都有大量条件:目标语言和编译器、目标机器、处理器和分支预测器。这一切使得系统变得非常复杂,你无法预见会得到什么确切的结果。我相信,如果您不处理嵌入式系统或类似的东西,最好的解决方案就是使用您更喜欢的形式。

There are many aspects in such comparison.

First, complexity for both options is O(n), so difference isn't very big anyway. I mean, you must not care about it if you write quite big and complex program with a large n and "heavy" operations .foo() and bar(). So, you must care about it only in case of very small simple programs (this is kind of programs for embedded devices, for example).

Second, it will depend on programming language and compiler. I'm assured that, for instance, most of C++ compilers will optimize your second option to produce same code as for the first one.

Third, if compiler haven't optimized your code, performance difference will heavily depend on the target processor. Consider loop in a term of assembly commands - it will look something like this (pseudo assembly language):

LABEL L1:
          do this    ;; some commands
          call that
          IF condition
          goto L1
          ;; some more instructions, ELSE part

I.e. every loop passage is just IF statement. But modern processors don't like IF. This is because processors may rearrange instructions to execute them beforehand or just to avoid idles. With the IF (in fact, conditional goto or jump) instructions, processors do not know if they may rearrange operation or not.
There's also a mechanism called branch predictor. From material of Wikipedia:

branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure.

This "soften" effect of IF's, through if the predictor's guess is wrong, no optimization will be performed.

So, you can see that there's a big amount of conditions for both your options: target language and compiler, target machine, it's processor and branch predictor. This all makes very complex system, and you cannot foresee what exact result you will get. I believe, that if you don't deal with embedded systems or something like that, the best solution is just to use the form which your are more comfortable with.

不打扰别人 2024-10-06 12:53:54

对于您的示例,您还需要额外关注 .size() 的成本,因为在大多数语言中,都会对每次 i 递增进行比较。

有多贵?好吧,这取决于情况,这当然都是相对的。如果 .foo().bar() 很昂贵,那么相比之下,实际迭代的成本可能微不足道。如果它们非常轻量级,那么它将占您执行时间的较大比例。如果您想了解特定案例测试它,这是确定您的具体场景的唯一方法。

就我个人而言,我肯定会选择单次迭代来降低成本(除非您需要在 .bar() 之前进行 .foo() 调用来电)。

For your examples you have the additional concern of how expensive .size() is, since it's compared for each time i increments in most languages.

How expensive is it? Well that depends, it's certainly all relative. If .foo() and .bar() are expensive, the cost of the actual iteration is probably minuscule in comparison. If they're pretty lightweight, then it'll be a larger percentage of your execution time. If you want to know about a particular case test it, this is the only way to be sure about your specific scenario.

Personally, I'd go with the single iteration to be on the cheap side for sure (unless you need the .foo() calls to happen before the .bar() calls).

属性 2024-10-06 12:53:54

我假设 .size() 将是不变的。否则,第一个代码示例可能与第二个代码示例不同。

大多数编译器可能会在循环开始之前将 .size() 存储在变量中,因此 .size() 时间将会减少。

因此,两个 for 循环内的内容的时间将是相同的,但其他部分将是两倍。

I assume .size() will be constant. Otherwise, the first code example might not give the same as the second one.

Most compilers would probably store .size() in a variable before the loop starts, so the .size() time will be cut down.

Therefore the time of the stuff inside the two for loops will be the same, but the other part will be twice as much.

将军与妓 2024-10-06 12:53:54

性能标签,对。

只要您专注于这个或那个次要代码段的“成本”,您就不会注意到更大的图景(隔离);你的目的是证明某些在更高层次上(在你孤立的背景之外)的做法是正确的,这只是不好的做法,并且违反了指导方针。这个问题的级别太低,因此太孤立。由一组集成组件组成的系统或程序比一组独立组件的性能要好得多。

当循环本身不必要地重复时,这个或那个隔离组件(在循环内工作)的速度或更快是无关紧要的,因此会花费两倍的时间。

假设你有一辆家用车(CPU),你到底为什么要:

  • 坐在家里让你的妻子出去买东西
  • 等她回来
  • 自己开车,出去买东西
  • 让她等你回来
    如果需要说明的话,您将花费 (a) 几乎一半来之不易的资源来同时执行一次旅行和购物,以及 (b) 回家后可以利用这些资源一起享受乐趣。

它与周六 9:00 的汽油价格、咖啡馆磨咖啡的时间或每次迭代的成本无关。

是的,时间和使用的资源存在很大差异。但成本不仅仅在于每次迭代的开销;还在于每次迭代的开销。这是一次有组织的旅行与两次连续旅行的总成本。

性能与架构有关;永远不要做任何事情两次(你可以做一次),这是更高层次的组织;将组成整体的部分整合起来。这并不是要在 Bowser 上数几便士或每次迭代的周期;而是要计算每个迭代的周期数。这些是较低级别的组织;它调整了零散部分的集合(而不是系统性的整体)。

Masseratis 无法比旅行车更快地通过交通拥堵。

Performance tag, right.

As long as you are concentrating on the "cost" of this or that minor code segment, you are oblivious to the bigger picture (isolation); and your intention is to justify something that, at a higher level (outside your isolated context), is simply bad practice, and breaks guidelines. The question is too low level and therefore too isolated. A system or program which is set of integrated components will perform much better that a collection of isolated components.

The fact that this or that isolated component (work inside the loop) is fast or faster is irrelevant when the loop itself is repeated unnecessarily, and which would therefore take twice the time.

Given that you have one family car (CPU), why on Earth would you:

  • sit at home and send your wife out to do her shopping
  • wait until she returns
  • take the car, go out and do your shopping
  • leaving her to wait until you return
    If it needs to be stated, you would spend (a) almost half of your hard-earned resources executing one trip and shopping at the same time and (b) have those resources available to have fun together when you get home.

It has nothing to do with the price of petrol at 9:00 on a Saturday, or the time it takes to grind coffee at the café, or cost of each iteration.

Yes, there is a large diff in the time and the resources used. But the cost is not merely in the overhead per iteration; it is in the overall cost of the one organised trip vs the two serial trips.

Performance is about architecture; never doing anything twice (that you can do once), which are the higher levels of organisation; integrated of the parts that make up the whole. It is not about counting pennies at the bowser or cycles per iteration; those are lower orders of organisation; which ajust a collection of fragmented parts (not a systemic whole).

Masseratis cannot get through traffic jams any faster than station wagons.

白衬杉格子梦 2024-10-06 12:53:54

它确实加起来。这是 Javascript 运行时两种情况之间的区别。花费了几乎两倍的时间。

然而,有时我仍然会循环两次,以提高可读性和可维护性,因为必须循环遍历少量元素。

输入图片此处描述

It does add up. Here is the difference between two cases for the Javascript runtime. It took nearly double amount of time.

However, sometimes I would still loop twice in an interest of better readability and maintainability given that have to loop through small number of elements.

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文