n = 0 和 n = n - n 之间的差异

发布于 2024-07-20 12:20:45 字数 380 浏览 7 评论 0原文

当我阅读 这个问题我记得有人曾经告诉我(很多年前),从汇编程序的角度来看,这两个操作非常不同:

n = 0;

n = n - n;

这是真的吗?如果是,为什么是这样?是这样吗?

编辑:正如一些回复所指出的,我想编译器可以很容易地优化到相同的东西。 但我发现有趣的是,如果编译器具有完全通用的方法,为什么它们会有所不同。

When I read this question I remembered someone once telling me (many years ago) that from an assembler-point-of-view, these two operations are very different:

n = 0;

n = n - n;

Is this true, and if it is, why is it so?

EDIT: As pointed out by some replies, I guess this would be fairly easy for a compiler to optimize into the same thing. But what I find interesting is why they would differ if the compiler had a completely general approach.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

简单气质女生网名 2024-07-27 12:20:45

编写您经常使用的汇编代码:

xor eax, eax

而不是

mov eax, 0

That 是因为在第一个语句中您只有操作码而没有涉及的参数。 您的 CPU 将在 1 个周期(而不是 2 个周期)内执行此操作。 我认为你的情况是类似的(尽管使用 sub)。

Writing assembler code you often used:

xor eax, eax

instead of

mov eax, 0

That is because with the first statement you have only the opcode and no involved argument. Your CPU will do that in 1 cylce (instead of 2). I think your case is something similar (although using sub).

多孤肩上扛 2024-07-27 12:20:45

编译器 VC++ 6.0,无优化:

4:        n = 0;
0040102F   mov         dword ptr [ebp-4],0
5:
6:        n = n - n;
00401036   mov         eax,dword ptr [ebp-4]
00401039   sub         eax,dword ptr [ebp-4]
0040103C   mov         dword ptr [ebp-4],eax

Compiler VC++ 6.0, without optimisations:

4:        n = 0;
0040102F   mov         dword ptr [ebp-4],0
5:
6:        n = n - n;
00401036   mov         eax,dword ptr [ebp-4]
00401039   sub         eax,dword ptr [ebp-4]
0040103C   mov         dword ptr [ebp-4],eax
揽月 2024-07-27 12:20:45

早期,内存和 CPU 周期都很稀缺。 这导致了很多所谓的“窥视孔优化”。 让我们看一下代码:

    move.l #0,d0

    moveq.l #0,d0

    sub.l a0,a0

第一条指令需要两个字节作为操作码,然后需要四个字节作为值 (0)。 这意味着浪费了四个字节,而且您需要访问内存两次(一次用于操作码,一次用于数据)。 慢点。

moveq.l 更好,因为它将数据合并到操作码中,但它只允许将 0 到 7 之间的值写入寄存器。 而且您仅限于数据寄存器,没有快速的方法来清除地址寄存器。 您必须清除数据寄存器,然后将数据寄存器加载到地址寄存器中(两个操作码。不好。)。

这导致最后一个操作适用于任何寄存器,只需要两个字节,一次内存读取。 翻译成 C 语言后,您将得到

n = n - n;

适用于最常用的 n 类型(整数或指针)的代码。

In the early days, memory and CPU cycles were scarce. That lead to a lot of so called "peep-hole optimizations". Let's look at the code:

    move.l #0,d0

    moveq.l #0,d0

    sub.l a0,a0

The first instruction would need two bytes for the op-code and then four bytes for the value (0). That meant four bytes wasted plus you'd need to access the memory twice (once for the opcode and once for the data). Sloooow.

moveq.l was better since it would merge the data into the op-code but it only allowed to write values between 0 and 7 into a register. And you were limited to data registers only, there was no quick way to clear an address register. You'd have to clear a data register and then load the data register into an address register (two op-codes. Bad.).

Which lead to the last operation which works on any register, need only two bytes, a single memory read. Translated into C, you'd get

n = n - n;

which would work for most often used types of n (integer or pointer).

苏大泽ㄣ 2024-07-27 12:20:45

优化编译器将为两者生成相同的汇编代码。

An optimizing compiler will produce the same assembly code for the two.

国产ˉ祖宗 2024-07-27 12:20:45

这可能取决于n是否被声明为易失性

It may depend on whether n is declared as volatile or not.

梦断已成空 2024-07-27 12:20:45

通过从自身中减去寄存器或与自身进行异或来将寄存器归零的汇编语言技术是一项有趣的技术,但它并不能真正转化为 C。

任何优化 C 编译器都会使用这种技术(如果它有意义的话),并尝试明确地写出来是不可能取得任何成果的。

The assembly-language technique of zeroing a register by subtracting it from itself or XORing it with itself is an interesting one, but it doesn't really translate to C.

Any optimising C compiler will use this technique if it makes sense, and trying to write it out explicitly is unlikely to achieve anything.

ㄖ落Θ余辉 2024-07-27 12:20:45

在 C 中,只有当你的编译器很糟糕(或者你像 MSVC 答案所示那样禁用了优化)时,它们才会有所不同(对于整数类型)。

也许以这种方式告诉您的人试图使用 C 语法描述像 sub reg,reg 这样的 asm 指令,而不谈论这样的语句实际上如何< /em> 使用现代优化编译器进行编译? 在这种情况下,对于大多数 x86 CPU,我不会说“非常不同”; 大多数特殊情况sub same,same作为归零习惯用法,例如xor same,same在 x86 汇编中将寄存器设置为零的最佳方法是什么:xor、mov 或 and?

这会生成一个 asm sub reg,reg 与 mov reg,0 类似,但代码大小稍好一些。 (但是,是的,英特尔 P6 系列上的部分寄存器重命名有一些独特的好处,您只能从归零惯用语中获得这些好处,而不是 mov)。


如果您的编译器尝试实现 < 中已被弃用的 memory_order_consume 语义,它们在 C 中可能会有所不同。 /code> 在 ARM 或 PowerPC 等弱序 ISA 上,其中 n=0 打破了对旧值的依赖关系,但 n = nn; 仍然“带有依赖关系” ,因此像 array[n] 这样的加载将在 n =atomic_load_explicit(&shared_var, memory_order_consume) 之后进行依赖排序。 有关更多详细信息,请参阅 C11 中的内存顺序消耗使用情况

在实践中,编译器放弃了尝试正确进行依赖项跟踪并促进consume加载到acquirehttp://www.open-std。 org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html什么时候不应该使用 [[carries_dependency]]?

但在弱序 ISA 的 asm 中,sub dst, same, same必需的仍然带有对输入寄存器的依赖,就像在 C 中一样。(大多数弱序 ISA 是具有固定宽度指令的 RISC,因此避免立即操作数不会使机器代码变得更小。因此历史上没有使用过即使在没有体系结构零寄存器的 ISA(例如 ARM)上,诸如 sub r1, r1, r1 之类的较短的归零惯用语也具有相同的大小和大小。在 MIPS 上,您只需移动 $v0, $zero

所以,对于那些非 x86 ISA,它们在 asm 中非常不同n=0 避免对变量(寄存器)旧值的任何错误依赖,而 n=nnn< 的旧值之前无法执行/代码> 已准备就绪。


仅限 x86 特殊情况 sub same,samexor same,same 作为依赖项破坏的归零习惯用法,如 mov eax, imm32 ,因为 mov eax, 0 是 5 个字节,而 xor eax,eax 只有 2 个字节。因此,在使用这种窥视孔优化之前,已经有很长的历史了——乱序执行 CPU,此类 CPU 需要高效运行现有代码。 在 x86 汇编中将寄存器设置为零的最佳方法是什么:xor、mov 或 and? 解释了详细信息。

除非您在 x86 asm 中手写,否则请像普通人一样编写 0,而不是 nnn^n,并且让编译器使用异或归零作为窥视孔优化。

其他ISA的Asm可能有其他窥视孔,例如另一个答案提到了m68k。 但同样,如果您用 C 语言编写,这是编译器的工作。 当您指的是 0 时,请写上 0。 尝试“控制”编译器使用 asm 窥视孔在禁用优化的情况下不太可能起作用,而在启用优化的情况下,编译器将在需要时有效地将寄存器清零。

In C they only differ (for integer types) if your compiler sucks (or you disabled optimization like an MSVC answer shows).

Perhaps the person who told you this way trying to describe an asm instruction like sub reg,reg using C syntax, not talking about how such a statement would actually compile with a modern optimizing compiler? In which case I wouldn't say "very different" for most x86 CPUs; most do special case sub same,same as a zeroing idiom, like xor same,same. What is the best way to set a register to zero in x86 assembly: xor, mov or and?

That makes an asm sub reg,reg similar to mov reg,0, with somewhat better code size. (But yes, some unique benefits wrt. partial-register renaming on Intel P6-family that you can only get from zeroing idioms, not mov).


They could differ in C if your compiler is trying to implement the mostly-deprecated memory_order_consume semantics from <stdatomic.h> on a weakly-ordered ISA like ARM or PowerPC, where n=0 breaks the dependency on the old value but n = n-n; still "carries a dependency", so a load like array[n] will be dependency-ordered after n = atomic_load_explicit(&shared_var, memory_order_consume). See Memory order consume usage in C11 for more details

In practice compilers gave up on trying to get that dependency-tracking right and promote consume loads to acquire. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html and When should you not use [[carries_dependency]]?

But in asm for weakly-ordered ISAs, sub dst, same, same is required to stil carry a dependency on the input register, just like in C. (Most weakly-ordered ISAs are RISCs with fixed-width instructions so avoiding an immediate operand doesn't make the machine code any smaller. Thus there is no historical use of shorter zeroing idioms like sub r1, r1, r1 even on ISAs like ARM that don't have an architectural zero register. mov r1, #0 is the same size and at least as efficient as any other way. On MIPS you'd just move $v0, $zero)

So yes, for those non-x86 ISAs, they are very different in asm. n=0 avoids any false dependency on the old value of the variable (register), while n=n-n can't execute until the old value of n is ready.


Only x86 special-cases sub same,same and xor same,same as a dependency-breaking zeroing idiom like mov eax, imm32, because mov eax, 0 is 5 bytes but xor eax,eax is only 2. So there was a long history of using this peephole optimization before out-of-order execution CPUs, and such CPUs needed to run existing code efficiently. What is the best way to set a register to zero in x86 assembly: xor, mov or and? explains the details.

Unless you're writing by hand in x86 asm, write 0 like a normal person instead of n-n or n^n, and let the compiler use xor-zeroing as a peephole optimization.

Asm for other ISAs might have other peepholes, e.g. another answer mentions m68k. But again, if you're writing in C this is the compiler's job. Write 0 when you mean 0. Trying to "hand hold" the compiler into using an asm peephole is very unlikely to work with optimization disabled, and with optimization enabled the compiler will efficiently zero a register if it needs to.

梦在深巷 2024-07-27 12:20:45

不确定汇编等,但一般来说,

n=0
n=n-n

如果 n 是浮点,则并不总是相等,请参见此处
http://www.codinghorror.com/blog/archives/001266.html

not sure about assembly and such, but generally,

n=0
n=n-n

isnt always equal if n is floating point, see here
http://www.codinghorror.com/blog/archives/001266.html

嗫嚅 2024-07-27 12:20:45

以下是一些极端情况,其中 n = 0n = n - n 的行为有所不同:

  • 如果 n 具有浮动点类型,对于特定值,结果将与 0 不同:-0.0Infinity-Infinity、< code>NaN...

  • 如果定义了 n as 易失性:第一个表达式将生成一个存储到相应的内存位置,而第二个表达式将生成两个加载和一个存储,此外,如果n是 如果

  • 如果禁用优化,编译器可能会为这两个表达式生成不同的代码,即使对于普通的 int n 也是如此,这可能会也可能不会以相同的速度执行。

Here are some corner cases where the behavior is different for n = 0 and n = n - n:

  • if n has a floating point type, the result will differ from 0 for specific values: -0.0, Infinity, -Infinity, NaN...

  • if n is defined as volatile: the first expression will generate a single store into the corresponding memory location, while the second expression will generate two loads and a store, furthermore if n is the location of a hardware register, the 2 loads might yield different values, causing the write to store a non 0 value.

  • if optimisations are disabled, the compiler might generate different code for these 2 expressions even for plain int n, which might or might not execute at the speed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文