gcc 会自动“展开”吗? if 语句?
假设我有一个如下所示的循环:
for(int i = 0; i < 10000; i++) {
/* Do something computationally expensive */
if (i < 200 && !(i%20)) {
/* Do something else */
}
}
其中一些琐碎的任务被困在仅运行几次的 if 语句后面。 我总是听说“循环中的 if 语句很慢!”因此,为了(稍微)提高性能,我将循环分为:
for(int i = 0; i < 200; i++) {
/* Do something computationally expensive */
if (!(i%20)) {
/* Do something else */
}
}
for(int i = 200; i < 10000; i++) {
/* Do something computationally expensive */
}
gcc(带有适当的标志,如 -O3)会自动将一个循环分成两个,还是仅展开以减少迭代次数?
Say I have a loop that looks like this:
for(int i = 0; i < 10000; i++) {
/* Do something computationally expensive */
if (i < 200 && !(i%20)) {
/* Do something else */
}
}
wherein some trivial task gets stuck behind an if-statement that only runs a handful of times.
I've always heard that "if-statements in loops are slow!" So, in the hopes of (marginally) increased performance, I split the loops apart into:
for(int i = 0; i < 200; i++) {
/* Do something computationally expensive */
if (!(i%20)) {
/* Do something else */
}
}
for(int i = 200; i < 10000; i++) {
/* Do something computationally expensive */
}
Will gcc (with the appropriate flags, like -O3) automatically break the one loop into two, or does it only unroll to decrease the number of iterations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么不直接反汇编程序并亲自看看呢?但我们开始了。这是测试程序:
这是使用 gcc 4.3.3 和 -o3 编译的反汇编代码的有趣部分:
所以正如我们所看到的,对于这个特定的示例,不,它没有。我们只有一个循环,从 main+32 开始,到 main+85 结束。如果您在阅读汇编代码时遇到问题 ecx = i; ebx = 总和。
但您的里程仍然可能会有所不同 - 谁知道在这种特殊情况下使用什么启发式方法,因此您必须编译您想到的代码,并查看更长/更复杂的计算如何影响优化器。
尽管在任何现代 CPU 上,分支预测器都会在如此简单的代码上表现得非常好,所以在任何一种情况下您都不会看到太多的性能损失。如果您的计算密集型代码需要数十亿个周期,那么少数错误预测可能会造成哪些性能损失?
Why not just disassemble the program and see for yourself? But here we go. This is the testprogram:
and this is the interesting part of the disassembled code compiled with gcc 4.3.3 and -o3:
So as we see, for this particular example, no it does not. We have only one loop starting at main+32 and ending at main+85. If you've got problems reading the assembly code ecx = i; ebx = sum.
But still your mileage may vary - who knows what heuristics are used for this particular case, so you'll have to compile the code you've got in mind and see how longer/more complicated computations influence the optimizer.
Though on any modern CPU the branch predictor will do pretty good on such easy code, so you won't see much performance losses in either case. What's the performance loss of maybe a handful mispredictions if your computation intense code needs billions of cycles?