gcc 生成的虚拟 movups
我发现了一点点好奇;当我打开很多优化标志时,GCC 似乎会生成以下代码:
00000000004019ae: test %si,%si
00000000004019b1: movups %xmm0,%xmm0
00000000004019b4: je 0x401f40 <main(int, char**)+1904>
问题:第二条指令的用途是什么?它看起来不像/做/任何事情;那么,是否需要对指令缓存中的程序进行一些优化呢?或者是乱序执行的情况? (如果有帮助的话,我正在 Nehalem 上使用 -mtune=native
进行编译:D)。
当然,没什么紧急的,只是好奇。
A little curiosity I found; GCC seems to generate the following code when I have a lot of optimization flags on:
00000000004019ae: test %si,%si
00000000004019b1: movups %xmm0,%xmm0
00000000004019b4: je 0x401f40 <main(int, char**)+1904>
Question: what purpose does the second instruction serve? It doesn't look like it /does/ anything; so, is it some optimization to align the program in the instruction cache? Or is it something with out-of-order execution? (I'm compiling with -mtune=native
on a Nehalem if that helps :D).
Nothing urgent, of course, just curious.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可能
xmm0
包含在整数域中完成的一些计算的结果(使用整数SSE指令)。下一条使用xmm0
的指令预计位于浮点域(浮点 SSE 指令)。如果使用
movaps
或movups
等指令将xmm0
迁移到浮点域,Nehalem 可能会更快地执行下一条指令。在条件跳转指令之前执行此迁移可能是有益的。在这种情况下,迁移仅进行一次。如果没有使用 movups 指令,则可能会执行两次迁移(自动地,通过该寄存器上的第一条 FP 指令),第一次是在错误预测的分支上,第二次是在正确的分支上。编译器注意到,优化计算依赖链似乎比优化代码大小和执行资源更好。
Possibly
xmm0
contains a result of some calculations, done in integer domain (with integer SSE instruction). And the next instruction usingxmm0
is expected to be in floating point domain (floating point SSE instruction).Nehalem may perform this next instruction faster if
xmm0
is migrated to floating point domain with instruction likemovaps
ormovups
. And it may be beneficial to perform this migration prior to conditional jump instruction. In this case migration is done only once. If nomovups
instruction used, migration may be done twice (automatically, by the first FP instruction on this register), first time speculatively, on mispredicted branch, and second time - on the correct branch.It seems, compiler noticed, that it is better to optimize calculation dependency chains, than to optimize for code size and execution resources.
除了 Evgeny Kluev 提出的假设之外,其他可能性(排名不分先后)是(a)它是编译器优化器错误,(b)插入
movups
来破坏依赖关系,或者(c)它插入的目的是为了代码对齐。Adding to the hypothesis proposed by Evgeny Kluev, other possibilities (in no particular order) are that (a) it's a compiler optimiser bug, (b)
movups
is inserted to break a dependency or (c) it is inserted for the purpose of code alignment.