分支在 ?: 运算符?
对于现代硬件上的典型现代编译器,会吗? :
运算符结果是否会产生影响指令管道的分支?
换句话说,哪个更快,调用这两种情况以避免可能的分支:
bool testVar = someValue(); // Used later.
purge(white);
purge(black);
或者选择实际需要清除的情况并仅使用运算符?:
:
bool testVar = someValue();
purge(testVar ? white : black);
我意识到您不知道需要多长时间purge() 会接受,但我只是在这里问一个一般性问题,关于我是否想要调用 purge() 两次以避免代码中可能出现分支。
我意识到这是一个非常微小的优化,可能没有真正的区别,但仍然想知道。我希望 ?:
不会导致分支,但想确保我的理解是正确的。
For a typical modern compiler on modern hardware, will the ? :
operator result in a branch that affects the instruction pipeline?
In other words which is faster, calling both cases to avoid a possible branch:
bool testVar = someValue(); // Used later.
purge(white);
purge(black);
or picking the one actually needed to be purged and only doing it with an operator ?:
:
bool testVar = someValue();
purge(testVar ? white : black);
I realize you have no idea how long purge() will take, but I'm just asking a general question here about whether I would ever want to call purge() twice to avoid a possible branch in the code.
I realize this is a very tiny optimization and may make no real difference, but would still like to know. I expect the ?:
does not result in branching, but want to make sure my understanding is correct.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
取决于平台。具体来说,取决于CPU的跳转预测表的大小以及CPU是否允许条件操作(如ARM)。
具有条件操作的 CPU 强烈支持第二种情况。具有较大跳转预测表的 CPU 将倾向于第一种情况。
真正的答案(就像任何其他性能问题一样):测量和比较。有时,代码的其余部分会出现曲线球,通常无法预测某些更改的影响。
Depends on the platform. Specifically, it depends on the size of jump prediction table of the CPU and whether the CPU allows conditional operations (like on ARM).
CPUs with conditional operations will strongly favor the second case. CPUs with bigger jump prediction tables will favor the first case.
The real answer (like with any other performance questions): measure and compare. Sometimes the rest of the code throws a curve ball and it's usually impossible to predict effects of some changes.
自 Pentium Pro 以来,CMOV(条件 MOVe)指令一直是 x86 指令集的一部分。由于常用的编译器选项和 C 语言的限制,GCC 很少自动生成它。 SETCC/CMOV 序列可以通过内联汇编插入到您的 C 程序中。仅当条件变量是程序内部循环(数百万次执行)中的随机振荡值时才应执行此操作。在非振荡情况和简单振荡模式的情况下,现代处理器可以以非常高的准确度预测分支。 2007 年,Linus Torvalds 在此建议 在大多数情况下避免使用 CMOV。
英特尔在英特尔(R) 架构软件开发人员手册,第 2 卷中描述了条件转移:指令集参考手册:
The CMOV (Conditional MOVe) instruction has been part of the x86 instruction set since the Pentium Pro. It is rarely automatically generated by GCC because of compiler options commonly used and restrictions placed by the C language. A SETCC/CMOV sequence can be inserted by inline assembly in your C program. This should only be done is cases where the conditional variable is a randomly oscillating value in the inner loop (millions of executions) of a program. In non-oscillating cases and in cases of simple patterns of oscillation, modern processors can predict branches with a very high degree of accuracy. In 2007, Linus Torvalds suggested here to avoid use of CMOV in most situations.
Intel describes the conditional move in the Intel(R) Architecture Software Developer's Manual, Volume 2: Instruction Set Reference Manual:
我无法想象第一种方法会更快。
使用第一种方法,您可以避免分支,但可以用函数调用替换它,这通常会涉及分支以及更多内容(除非它是内联的)。即使是内联的,除非 purge() 函数内部的功能绝对微不足道,否则几乎肯定会更慢。
I can't imagine the first method would ever be faster.
With the first method you may avoid a branch, but you replace it with a function call, which would usually involve a branch plus a lot more (unless it was inlined). Even if inlined, unless the functionality inside the purge() function was absolutely trivial it would almost certainly be slower.
调用函数至少与执行逻辑测试 + 跳转一样昂贵(是的,
? :
三元运算符需要跳转)。Calling a function is at least as expensive as doing a logic test + jump (and yes, the
? :
ternary operator would require a jump).在第一种情况下,清除被调用两次。在第二种情况下,清除被调用一次,
很难回答有关分支的问题,因为它非常依赖于编译器和指令集。例如,在 ARM(具有条件指令执行)上,它可能不会分支。在 x86 上几乎肯定会
in the first case purge is called twice. In the second case purge is called once
Its hard to answer the question about branching because its so dependent on compilers and instruction set. For example on an ARM (which has conditional instruction execution) it might not branch. ON an x86 it almost certainly will