分支在 ?: 运算符?

发布于 2024-11-30 21:31:55 字数 492 浏览 3 评论 0原文

对于现代硬件上的典型现代编译器,会吗? : 运算符结果是否会产生影响指令管道的分支?

换句话说,哪个更快,调用这两种情况以避免可能的分支:

bool testVar = someValue(); // Used later.
purge(white);
purge(black);

或者选择实际需要清除的情况并仅使用运算符?:

bool testVar = someValue();
purge(testVar ? white : black);

我意识到您不知道需要多长时间purge() 会接受,但我只是在这里问一个一般性问题,关于我是否想要调用 purge() 两次以避免代码中可能出现分支。

我意识到这是一个非常微小的优化,可能没有真正的区别,但仍然想知道。我希望 ?: 不会导致分支,但想确保我的理解是正确的。

For a typical modern compiler on modern hardware, will the ? : operator result in a branch that affects the instruction pipeline?

In other words which is faster, calling both cases to avoid a possible branch:

bool testVar = someValue(); // Used later.
purge(white);
purge(black);

or picking the one actually needed to be purged and only doing it with an operator ?::

bool testVar = someValue();
purge(testVar ? white : black);

I realize you have no idea how long purge() will take, but I'm just asking a general question here about whether I would ever want to call purge() twice to avoid a possible branch in the code.

I realize this is a very tiny optimization and may make no real difference, but would still like to know. I expect the ?: does not result in branching, but want to make sure my understanding is correct.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

·深蓝 2024-12-07 21:31:55

取决于平台。具体来说,取决于CPU的跳转预测表的大小以及CPU是否允许条件操作(如ARM)。

具有条件操作的 CPU 强烈支持第二种情况。具有较大跳转预测表的 CPU 将倾向于第一种情况。

真正的答案(就像任何其他性能问题一样):测量和比较。有时,代码的其余部分会出现曲线球,通常无法预测某些更改的影响。

Depends on the platform. Specifically, it depends on the size of jump prediction table of the CPU and whether the CPU allows conditional operations (like on ARM).

CPUs with conditional operations will strongly favor the second case. CPUs with bigger jump prediction tables will favor the first case.

The real answer (like with any other performance questions): measure and compare. Sometimes the rest of the code throws a curve ball and it's usually impossible to predict effects of some changes.

比忠 2024-12-07 21:31:55

自 Pentium Pro 以来,CMOV(条件 MOVe)指令一直是 x86 指令集的一部分。由于常用的编译器选项和 C 语言的限制,GCC 很少自动生成它。 SETCC/CMOV 序列可以通过内联汇编插入到您的 C 程序中。仅当条件变量是程序内部循环(数百万次执行)中的随机振荡值时才应执行此操作。在非振荡情况和简单振荡模式的情况下,现代处理器可以以非常高的准确度预测分支。 2007 年,Linus Torvalds 在此建议 在大多数情况下避免使用 CMOV。

英特尔在英特尔(R) 架构软件开发人员手册,第 2 卷中描述了条件转移:指令集参考手册

CMOVcc 指令检查一个或多个状态
EFLAGS 寄存器中的标志(CF、OF、PF、SF 和 ZF)并执行
如果标志处于指定状态(或条件),则进行移动操作。一个
条件代码 (cc) 与每条指令相关联以指示
正在测试的条件。如果不满足条件,则
不执行移动并继续执行指令
遵循 CMOVcc 指令。

这些指令可以将 16 位或 32 位值从内存移动到
通用寄存器或从一个通用寄存器到
其他。 8 位寄存器操作数的条件移动不是
支持。

描述中给出了每个 CMOVcc 助记符的条件
上表的列。术语“更少”和“更大”用于
有符号整数和术语“上方”和“下方”的比较是
用于无符号整数。

因为状态标志的特定状态有时可能是
以两种方式解释,为某些操作码定义了两个助记符。
例如,CMOVA(条件移动,如果以上)指令和
CMONVBE(不低于或等于则条件移动)指令是
操作码 0F 47H 的备用助记符。

The CMOV (Conditional MOVe) instruction has been part of the x86 instruction set since the Pentium Pro. It is rarely automatically generated by GCC because of compiler options commonly used and restrictions placed by the C language. A SETCC/CMOV sequence can be inserted by inline assembly in your C program. This should only be done is cases where the conditional variable is a randomly oscillating value in the inner loop (millions of executions) of a program. In non-oscillating cases and in cases of simple patterns of oscillation, modern processors can predict branches with a very high degree of accuracy. In 2007, Linus Torvalds suggested here to avoid use of CMOV in most situations.

Intel describes the conditional move in the Intel(R) Architecture Software Developer's Manual, Volume 2: Instruction Set Reference Manual:

The CMOVcc instructions check the state of one or more of the status
flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a
move operation if the flags are in a specified state (or condition). A
condition code (cc) is associated with each instruction to indicate
the condition being tested for. If the condition is not satisfied, a
move is not performed and execution continues with the instruction
following the CMOVcc instruction.

These instructions can move a 16- or 32-bit value from memory to a
general-purpose register or from one general-purpose register to
another. Conditional moves of 8-bit register operands are not
supported.

The conditions for each CMOVcc mnemonic is given in the description
column of the above table. The terms “less” and “greater” are used for
comparisons of signed integers and the terms “above” and “below” are
used for unsigned integers.

Because a particular state of the status flags can sometimes be
interpreted in two ways, two mnemonics are defined for some opcodes.
For example, the CMOVA (conditional move if above) instruction and the
CMOVNBE (conditional move if not below or equal) instruction are
alternate mnemonics for the opcode 0F 47H.

场罚期间 2024-12-07 21:31:55

我无法想象第一种方法会更快。

使用第一种方法,您可以避免分支,但可以用函数调用替换它,这通常会涉及分支以及更多内容(除非它是内联的)。即使是内联的,除非 purge() 函数内部的功能绝对微不足道,否则几乎肯定会更慢。

I can't imagine the first method would ever be faster.

With the first method you may avoid a branch, but you replace it with a function call, which would usually involve a branch plus a lot more (unless it was inlined). Even if inlined, unless the functionality inside the purge() function was absolutely trivial it would almost certainly be slower.

夏花。依旧 2024-12-07 21:31:55

调用函数至少与执行逻辑测试 + 跳转一样昂贵(是的, ? : 三元运算符需要跳转)。

Calling a function is at least as expensive as doing a logic test + jump (and yes, the ? : ternary operator would require a jump).

嘿看小鸭子会跑 2024-12-07 21:31:55

在第一种情况下,清除被调用两次。在第二种情况下,清除被调用一次,

很难回答有关分支的问题,因为它非常依赖于编译器和指令集。例如,在 ARM(具有条件指令执行)上,它可能不会分支。在 x86 上几乎肯定会

in the first case purge is called twice. In the second case purge is called once

Its hard to answer the question about branching because its so dependent on compilers and instruction set. For example on an ARM (which has conditional instruction execution) it might not branch. ON an x86 it almost certainly will

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文