x86 inc 与 add 指令的相对性能

发布于 2024-11-06 21:03:36 字数 281 浏览 3 评论 0原文

快速问题，事先假设

mov eax, 0

哪个更有效？

inc eax
inc eax

或者

add eax, 2

此外，如果两个 inc 更快，编译器（例如 GCC）通常（即没有激进的优化标志）将 var += 2 优化为它？

PS：不要用“不要过早优化”的变体来回答，这只是学术兴趣。

原文

Quick question, assuming beforehand

mov eax, 0

which is more efficient?

inc eax
inc eax

add eax, 2

Also, in case the two incs are faster, do compilers (say, the GCC) commonly (i.e. w/o aggressive optimization flags) optimize var += 2 to it?

PS: Don't bother to answer with a variation of "don't prematurely optimize", this is merely academic interest.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小伙你站住 2024-11-13 21:03:36

同一寄存器上的两条 inc 指令（或者更一般地说两条读-修改-写指令）始终具有至少两个周期的依赖链。这是假设 inc 有一个时钟延迟，自 486 以来就是这种情况。这意味着如果周围的指令不能与两个 inc 指令交错以隐藏这些延迟，则代码执行速度将会变慢。

但无论如何，编译器都不会发出您建议的指令序列（mov eax,0 将被 xor eax,eax 替换，请参阅将寄存器与其自身进行异或的目的是什么？）

mov eax,0
inc eax
inc eax

它将被优化为

mov eax,2

Two inc instructions on the same register (or more generally speaking two read-modify-write instructions) do always have a dependency chain of at least two cycles. This is assuming a one clock latency for a inc, which is the case since the 486. That means if the surrounding instructions can't be interleaved with the two inc instructions to hide those latencies, the code will execute slower.

But no compiler will emit the instruction sequence you propose anyway (mov eax,0 will be replaced by xor eax,eax, see What is the purpose of XORing a register with itself?)

mov eax,0
inc eax
inc eax

it will be optimizied to

mov eax,2

回复收藏 0 原文

梦冥 2024-11-13 21:03:36

如果您想了解 x86 指令的原始性能统计数据，请参阅Agner Fogs 博士列表（准确地说是第 4 卷）。至于关于编译器的部分，那取决于编译器的代码生成器，而不是你应该过度依赖的东西。

旁注：我觉得有趣/讽刺的是，在有关性能的问题中，您使用 MOV EAX,0 来将寄存器归零，而不是 XOR EAX,EAX ：P （如果 MOV EAX,0 事先完成，最快的变体是删除 inc 和 add，而只是 MOV EAX,2）。

回复收藏 0 原文

陪你搞怪i 2024-11-13 21:03:36

从英特尔手册中，您可以在此处找到ADD/SUB 指令在一种特定架构上便宜半个周期。但请记住，英特尔为其（最新的）处理器使用了无序执行模型。这主要意味着，只要处理器必须等待数据进入，就会出现性能瓶颈（例如，在 L1/L2/L3/RAM 数据获取期间，它没有事情可做）。因此，如果您的探查器告诉您 INC 可能是问题所在；从数据吞吐量的角度来看待它，而不是着眼于原始周期计数。

Instruction              Latency1           Throughput         Execution Unit 
                                                            2 
CPUID                    0F_3H    0F_2H      0F_3H    0F_2H    0F_2H 

ADD/SUB                  1        0.5        0.5      0.5      ALU 
[...]
DEC/INC                  1        1          0.5      0.5      ALU

From the Intel manual that you can find here it looks like the ADD/SUB instructions are half a cycle cheaper on one particular architecture. But remember that Intel uses an out-of-order execution model for it's (recent) processors. This primarily means, performance bottlenecks show up wherever the processor has to wait for data to come in (eg. it ran out of things to do during the L1/L2/L3/RAM data-fetch). So if you're profiler tells you INC might be the problem; look at it form a data-throughput point of view instead of looking at raw cycle-counts.

Instruction              Latency1           Throughput         Execution Unit 
                                                            2 
CPUID                    0F_3H    0F_2H      0F_3H    0F_2H    0F_2H 

ADD/SUB                  1        0.5        0.5      0.5      ALU 
[...]
DEC/INC                  1        1          0.5      0.5      ALU

回复收藏 0 原文

凡尘雨 2024-11-13 21:03:36

出于所有目的，这可能并不重要。但请考虑到 inc 使用较少的字节。

考虑以下代码：

int x = 0;
x += 2;

在不使用任何优化标志的情况下，GCC将此代码编译为：

80483ed:       c7 44 24 1c 00 00 00    movl   $0x0,0x1c(%esp)
80483f4:       00 
80483f5:       83 44 24 1c 02          addl   $0x2,0x1c(%esp)

使用-O1和-O2，它变成：

c7 44 24 08 02 00 00    movl   $0x2,0x8(%esp)

有趣，不是吗？

For all purposes, it probably doesn't matter. But take into account that inc uses less bytes.

Consider the following code:

int x = 0;
x += 2;

Without using any optimization flags, GCC compiles this code into:

80483ed:       c7 44 24 1c 00 00 00    movl   $0x0,0x1c(%esp)
80483f4:       00 
80483f5:       83 44 24 1c 02          addl   $0x2,0x1c(%esp)

Using -O1 and -O2, it becomes:

c7 44 24 08 02 00 00    movl   $0x2,0x8(%esp)

Funny, isn't it?

回复收藏 0 原文

~没有更多了~

关于作者

删除会话

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

x86 inc 与 add 指令的相对性能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

x86 inc 与 add 指令的相对性能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。