n = 0 和 n = n - n 之间的差异
当我阅读 这个问题我记得有人曾经告诉我(很多年前),从汇编程序的角度来看,这两个操作非常不同:
n = 0;
n = n - n;
这是真的吗?如果是,为什么是这样?是这样吗?
编辑:正如一些回复所指出的,我想编译器可以很容易地优化到相同的东西。 但我发现有趣的是,如果编译器具有完全通用的方法,为什么它们会有所不同。
When I read this question I remembered someone once telling me (many years ago) that from an assembler-point-of-view, these two operations are very different:
n = 0;
n = n - n;
Is this true, and if it is, why is it so?
EDIT: As pointed out by some replies, I guess this would be fairly easy for a compiler to optimize into the same thing. But what I find interesting is why they would differ if the compiler had a completely general approach.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
编写您经常使用的汇编代码:
而不是
That 是因为在第一个语句中您只有操作码而没有涉及的参数。 您的 CPU 将在 1 个周期(而不是 2 个周期)内执行此操作。 我认为你的情况是类似的(尽管使用 sub)。
Writing assembler code you often used:
instead of
That is because with the first statement you have only the opcode and no involved argument. Your CPU will do that in 1 cylce (instead of 2). I think your case is something similar (although using sub).
编译器 VC++ 6.0,无优化:
Compiler VC++ 6.0, without optimisations:
早期,内存和 CPU 周期都很稀缺。 这导致了很多所谓的“窥视孔优化”。 让我们看一下代码:
第一条指令需要两个字节作为操作码,然后需要四个字节作为值 (0)。 这意味着浪费了四个字节,而且您需要访问内存两次(一次用于操作码,一次用于数据)。 慢点。
moveq.l 更好,因为它将数据合并到操作码中,但它只允许将 0 到 7 之间的值写入寄存器。 而且您仅限于数据寄存器,没有快速的方法来清除地址寄存器。 您必须清除数据寄存器,然后将数据寄存器加载到地址寄存器中(两个操作码。不好。)。
这导致最后一个操作适用于任何寄存器,只需要两个字节,一次内存读取。 翻译成 C 语言后,您将得到
适用于最常用的
n
类型(整数或指针)的代码。In the early days, memory and CPU cycles were scarce. That lead to a lot of so called "peep-hole optimizations". Let's look at the code:
The first instruction would need two bytes for the op-code and then four bytes for the value (0). That meant four bytes wasted plus you'd need to access the memory twice (once for the opcode and once for the data). Sloooow.
moveq.l
was better since it would merge the data into the op-code but it only allowed to write values between 0 and 7 into a register. And you were limited to data registers only, there was no quick way to clear an address register. You'd have to clear a data register and then load the data register into an address register (two op-codes. Bad.).Which lead to the last operation which works on any register, need only two bytes, a single memory read. Translated into C, you'd get
which would work for most often used types of
n
(integer or pointer).优化编译器将为两者生成相同的汇编代码。
An optimizing compiler will produce the same assembly code for the two.
这可能取决于
n
是否被声明为易失性
。It may depend on whether
n
is declared asvolatile
or not.通过从自身中减去寄存器或与自身进行异或来将寄存器归零的汇编语言技术是一项有趣的技术,但它并不能真正转化为 C。
任何优化 C 编译器都会使用这种技术(如果它有意义的话),并尝试明确地写出来是不可能取得任何成果的。
The assembly-language technique of zeroing a register by subtracting it from itself or XORing it with itself is an interesting one, but it doesn't really translate to C.
Any optimising C compiler will use this technique if it makes sense, and trying to write it out explicitly is unlikely to achieve anything.
在 C 中,只有当你的编译器很糟糕(或者你像 MSVC 答案所示那样禁用了优化)时,它们才会有所不同(对于整数类型)。
也许以这种方式告诉您的人试图使用 C 语法描述像
sub reg,reg
这样的 asm 指令,而不谈论这样的语句实际上如何< /em> 使用现代优化编译器进行编译? 在这种情况下,对于大多数 x86 CPU,我不会说“非常不同”; 大多数做特殊情况sub same,same
作为归零习惯用法,例如xor same,same
。 在 x86 汇编中将寄存器设置为零的最佳方法是什么:xor、mov 或 and?这会生成一个 asm
sub reg,reg 与
mov reg,0
类似,但代码大小稍好一些。 (但是,是的,英特尔 P6 系列上的部分寄存器重命名有一些独特的好处,您只能从归零惯用语中获得这些好处,而不是mov
)。如果您的编译器尝试实现< 中已被弃用的
memory_order_consume
语义,它们在 C 中可能会有所不同。 /code> 在 ARM 或 PowerPC 等弱序 ISA 上,其中n=0
打破了对旧值的依赖关系,但n = nn;
仍然“带有依赖关系” ,因此像array[n]
这样的加载将在n =atomic_load_explicit(&shared_var, memory_order_consume)
之后进行依赖排序。 有关更多详细信息,请参阅 C11 中的内存顺序消耗使用情况在实践中,编译器放弃了尝试正确进行依赖项跟踪并促进
consume
加载到acquire
。 http://www.open-std。 org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html 和 什么时候不应该使用 [[carries_dependency]]?但在弱序 ISA 的 asm 中,
sub dst, same, same
是必需的仍然带有对输入寄存器的依赖,就像在 C 中一样。(大多数弱序 ISA 是具有固定宽度指令的 RISC,因此避免立即操作数不会使机器代码变得更小。因此历史上没有使用过即使在没有体系结构零寄存器的 ISA(例如 ARM)上,诸如sub r1, r1, r1
之类的较短的归零惯用语也具有相同的大小和大小。在 MIPS 上,您只需移动 $v0, $zero
)所以,对于那些非 x86 ISA,它们在 asm 中非常不同。
n=0
避免对变量(寄存器)旧值的任何错误依赖,而n=nn
在n< 的旧值之前无法执行/代码> 已准备就绪。
仅限 x86 特殊情况
sub same,same
和xor same,same
作为依赖项破坏的归零习惯用法,如mov eax, imm32
,因为mov eax, 0
是 5 个字节,而xor eax,eax
只有 2 个字节。因此,在使用这种窥视孔优化之前,已经有很长的历史了——乱序执行 CPU,此类 CPU 需要高效运行现有代码。 在 x86 汇编中将寄存器设置为零的最佳方法是什么:xor、mov 或 and? 解释了详细信息。除非您在 x86 asm 中手写,否则请像普通人一样编写
0
,而不是nn
或n^n
,并且让编译器使用异或归零作为窥视孔优化。其他ISA的Asm可能有其他窥视孔,例如另一个答案提到了m68k。 但同样,如果您用 C 语言编写,这是编译器的工作。 当您指的是
0
时,请写上0
。 尝试“控制”编译器使用 asm 窥视孔在禁用优化的情况下不太可能起作用,而在启用优化的情况下,编译器将在需要时有效地将寄存器清零。In C they only differ (for integer types) if your compiler sucks (or you disabled optimization like an MSVC answer shows).
Perhaps the person who told you this way trying to describe an asm instruction like
sub reg,reg
using C syntax, not talking about how such a statement would actually compile with a modern optimizing compiler? In which case I wouldn't say "very different" for most x86 CPUs; most do special casesub same,same
as a zeroing idiom, likexor same,same
. What is the best way to set a register to zero in x86 assembly: xor, mov or and?That makes an asm
sub reg,reg
similar tomov reg,0
, with somewhat better code size. (But yes, some unique benefits wrt. partial-register renaming on Intel P6-family that you can only get from zeroing idioms, notmov
).They could differ in C if your compiler is trying to implement the mostly-deprecated
memory_order_consume
semantics from<stdatomic.h>
on a weakly-ordered ISA like ARM or PowerPC, wheren=0
breaks the dependency on the old value butn = n-n;
still "carries a dependency", so a load likearray[n]
will be dependency-ordered aftern = atomic_load_explicit(&shared_var, memory_order_consume)
. See Memory order consume usage in C11 for more detailsIn practice compilers gave up on trying to get that dependency-tracking right and promote
consume
loads toacquire
. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html and When should you not use [[carries_dependency]]?But in asm for weakly-ordered ISAs,
sub dst, same, same
is required to stil carry a dependency on the input register, just like in C. (Most weakly-ordered ISAs are RISCs with fixed-width instructions so avoiding an immediate operand doesn't make the machine code any smaller. Thus there is no historical use of shorter zeroing idioms likesub r1, r1, r1
even on ISAs like ARM that don't have an architectural zero register.mov r1, #0
is the same size and at least as efficient as any other way. On MIPS you'd justmove $v0, $zero
)So yes, for those non-x86 ISAs, they are very different in asm.
n=0
avoids any false dependency on the old value of the variable (register), whilen=n-n
can't execute until the old value ofn
is ready.Only x86 special-cases
sub same,same
andxor same,same
as a dependency-breaking zeroing idiom likemov eax, imm32
, becausemov eax, 0
is 5 bytes butxor eax,eax
is only 2. So there was a long history of using this peephole optimization before out-of-order execution CPUs, and such CPUs needed to run existing code efficiently. What is the best way to set a register to zero in x86 assembly: xor, mov or and? explains the details.Unless you're writing by hand in x86 asm, write
0
like a normal person instead ofn-n
orn^n
, and let the compiler use xor-zeroing as a peephole optimization.Asm for other ISAs might have other peepholes, e.g. another answer mentions m68k. But again, if you're writing in C this is the compiler's job. Write
0
when you mean0
. Trying to "hand hold" the compiler into using an asm peephole is very unlikely to work with optimization disabled, and with optimization enabled the compiler will efficiently zero a register if it needs to.不确定汇编等,但一般来说,
如果 n 是浮点,则并不总是相等,请参见此处
http://www.codinghorror.com/blog/archives/001266.html
not sure about assembly and such, but generally,
isnt always equal if n is floating point, see here
http://www.codinghorror.com/blog/archives/001266.html
以下是一些极端情况,其中
n = 0
和n = n - n
的行为有所不同:如果
n
具有浮动点类型,对于特定值,结果将与0
不同:-0.0
、Infinity
、-Infinity
、< code>NaN...如果定义了
n
as易失性
:第一个表达式将生成一个存储到相应的内存位置,而第二个表达式将生成两个加载和一个存储,此外,如果n
是 如果如果禁用优化,编译器可能会为这两个表达式生成不同的代码,即使对于普通的
int n
也是如此,这可能会也可能不会以相同的速度执行。Here are some corner cases where the behavior is different for
n = 0
andn = n - n
:if
n
has a floating point type, the result will differ from0
for specific values:-0.0
,Infinity
,-Infinity
,NaN
...if
n
is defined asvolatile
: the first expression will generate a single store into the corresponding memory location, while the second expression will generate two loads and a store, furthermore ifn
is the location of a hardware register, the 2 loads might yield different values, causing the write to store a non0
value.if optimisations are disabled, the compiler might generate different code for these 2 expressions even for plain
int n
, which might or might not execute at the speed.