x86 汇编语言
我一直在努力掌握 x86 汇编语言,并且想知道是否有一个与 movl $1, %eax
等效的快速简短的语言。就在那时,我认为列出该语言中经常使用的习语可能是个好主意。
这可能包括首选使用 xorl %eax, %eax
,而不是 movl $0, %eax
或 testl %eax, %eax
反对cmpl $0, %eax
。
哦,请为每篇文章发布一个示例!
I've been trying to get a good hold on the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax
. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.
This could include the preferred use of xorl %eax, %eax
as opposed to movl $0, %eax
, or testl %eax, %eax
against cmpl $0, %eax
.
Oh, and kindly post one example per post!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
在循环中...
更快更短
。
At loops...
is
Faster and shorter.
您可能还想知道如何在装配中进行优化。然后你必须问你要优化什么:大小还是速度?无论如何,这是我的“习语”,它是
xchg
的替代品:You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for
xchg
:使用
SHL
和SHR
进行 2 的幂乘法/除法Using
SHL
andSHR
for multiplication/division by a power of 2另一种(除了
xor
)是
基本原理:更小的操作码
Another one (beside
xor
) foris
Rationale: smaller opcode
不知道这是否算作习惯用法,但在 i7 之前的大多数处理器上
,或者如果 SSE3 可用,
从未对齐的内存位置读取的速度比
Don't know whether this counts as an idiom, but on most processors prior to i7
or, if SSE3 is available,
are faster for reading from an unaligned memory location than
最早对除以不变整数的引用不仅仅是逆乘法,如下:
斯德哥尔摩皇家理工学院的 Torbjörn Granlund。看看他的出版物
The earliest reference to division by invariant integers that is more than just an inverse multiply is here:
Torbjörn Granlund of The Royal Institue of Technology in Stockholm. Check out his publications
扩展我的评论:
对于像 Pentium Pro 这样缺乏辨别力的处理器,
xorl %eax, %eax
似乎依赖于%eax
,因此必须等待该值该寄存器可用。后来的处理器实际上有额外的逻辑来识别该指令不具有任何依赖性。指令
incl
和decl
设置了一些标志,但其他标志保持不变。如果出于指令重新排序的目的将标志建模为单个寄存器,这是最糟糕的情况:任何在incl
或decl
之后读取标志的指令都必须被视为依赖在incl
或decl
上(如果它正在读取该指令设置的标志之一)以及设置标志的上一条指令(如果它正在读取以下之一)该指令未设置的标志)。一个解决方案是将标志寄存器分成两部分,并考虑与这种更细粒度的依赖关系……但 AMD 有一个更好的想法,并从他们几年前提出的 64 位扩展中完全删除了这些指令。关于链接,我在英特尔手册中找到了这一点,提供链接是没有用的,因为它们位于每六个月重新组织一次的公司网站上,或者在 Agner Fog 的网站上: agner.org/optimize/#manuals" rel="noreferrer">http://www.agner.org/optimize/#manuals
Expanding on my comment:
To an undiscerning processor such as the Pentium Pro,
xorl %eax, %eax
appears to have a dependency on%eax
and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.The instructions
incl
anddecl
set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after anincl
ordecl
must be considered as depending on theincl
ordecl
(in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals
这是另一个有趣的“习语”。希望每个人都知道,即使与乘法相比,除法也是一个很大的时间消耗。使用一点数学知识,可以乘以常数的倒数,而不是除以它。这超出了 shr 技巧的范围。例如,除以 5:
现在 eax 已被除以 5,而无需使用 Slow div 操作码。以下是从 http:// /blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx
对于列表中没有的数字,您可能需要事先进行移位(除以 6,shr 1,然后乘以通过 3) 的倒数。
Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:
Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx
For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).
在 x64 上:
for
(第一个也隐式清除 rax 的上半部分,但操作码较小)
on x64:
for
(the first one also implicitly clears the upper half of
rax
, but has a smaller opcode)使用
LEA
进行乘法运算,例如:对于 EAX = 5 * ECX
Using
LEA
for e.g. multiplication, like:for EAX = 5 * ECX