x86 汇编语言

发布于 2024-08-29 01:10:54 字数 267 浏览 2 评论 0原文

我一直在努力掌握 x86 汇编语言,并且想知道是否有一个与 movl $1, %eax 等效的快速简短的语言。就在那时,我认为列出该语言中经常使用的习语可能是个好主意。

这可能包括首选使用 xorl %eax, %eax,而不是 movl $0, %eaxtestl %eax, %eax反对cmpl $0, %eax

哦,请为每篇文章发布一个示例!

I've been trying to get a good hold on the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.

This could include the preferred use of xorl %eax, %eax as opposed to movl $0, %eax, or testl %eax, %eax against cmpl $0, %eax.

Oh, and kindly post one example per post!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

夜巴黎 2024-09-05 01:10:56

在循环中...

  dec     ecx 
  cmp     ecx, -1       
  jnz     Loop              

更快更短

  dec     ecx  
  jns     Loop 

At loops...

  dec     ecx 
  cmp     ecx, -1       
  jnz     Loop              

is

  dec     ecx  
  jns     Loop 

Faster and shorter.

月棠 2024-09-05 01:10:56

您可能还想知道如何在装配中进行优化。然后你必须问你要优化什么:大小还是速度?无论如何,这是我的“习语”,它是 xchg 的替代品:

xor eax, ebx
xor ebx, eax
xor eax, ebx

You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for xchg:

xor eax, ebx
xor ebx, eax
xor eax, ebx
蝶…霜飞 2024-09-05 01:10:56

使用 SHLSHR 进行 2 的幂乘法/除法

Using SHL and SHR for multiplication/division by a power of 2

很酷不放纵 2024-09-05 01:10:56

另一种(除了xor

mov eax, 0   ; B800000000h

sub eax, eax ; 29C0h

基本原理:更小的操作码

Another one (beside xor) for

mov eax, 0   ; B800000000h

is

sub eax, eax ; 29C0h

Rationale: smaller opcode

暖伴 2024-09-05 01:10:56

不知道这是否算作习惯用法,但在 i7 之前的大多数处理器上

movq xmm0, [eax]
movhps xmm0, [eax+8]

,或者如果 SSE3 可用,

lddqu xmm0, [eax]

从未对齐的内存位置读取的速度比

movdqu xmm0, [eax]

Don't know whether this counts as an idiom, but on most processors prior to i7

movq xmm0, [eax]
movhps xmm0, [eax+8]

or, if SSE3 is available,

lddqu xmm0, [eax]

are faster for reading from an unaligned memory location than

movdqu xmm0, [eax]
左耳近心 2024-09-05 01:10:56

最早对除以不变整数的引用不仅仅是逆乘法,如下:
斯德哥尔摩皇家理工学院的 Torbjörn Granlund。看看他的出版物

The earliest reference to division by invariant integers that is more than just an inverse multiply is here:
Torbjörn Granlund of The Royal Institue of Technology in Stockholm. Check out his publications

我偏爱纯白色 2024-09-05 01:10:55

扩展我的评论:

对于像 Pentium Pro 这样缺乏辨别力的处理器,xorl %eax, %eax 似乎依赖于 %eax,因此必须等待该值该寄存器可用。后来的处理器实际上有额外的逻辑来识别该指令不具有任何依赖性。

指令 incldecl 设置了一些标志,但其他标志保持不变。如果出于指令重新排序的目的将标志建模为单个寄存器,这是最糟糕的情况:任何在 incldecl 之后读取标志的指令都必须被视为依赖在 incldecl 上(如果它正在读取该指令设置的标志之一)以及设置标志的上一条指令(如果它正在读取以下之一)该指令未设置的标志)。一个解决方案是将标志寄存器分成两部分,并考虑与这种更细粒度的依赖关系……但 AMD 有一个更好的想法,并从他们几年前提出的 64 位扩展中完全删除了这些指令。

关于链接,我在英特尔手册中找到了这一点,提供链接是没有用的,因为它们位于每六个月重新组织一次的公司网站上,或者在 Agner Fog 的网站上: agner.org/optimize/#manuals" rel="noreferrer">http://www.agner.org/optimize/#manuals

Expanding on my comment:

To an undiscerning processor such as the Pentium Pro, xorl %eax, %eax appears to have a dependency on %eax and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.

The instructions incl and decl set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after an incl or decl must be considered as depending on the incl or decl (in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.

Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals

肩上的翅膀 2024-09-05 01:10:54

这是另一个有趣的“习语”。希望每个人都知道,即​​使与乘法相比,除法也是一个很大的时间消耗。使用一点数学知识,可以乘以常数的倒数,而不是除以它。这超出了 shr 技巧的范围。例如,除以 5:

mov eax, some_number
mov ebx, 3435973837    // 32-bit inverse of 5
mul ebx

现在 eax 已被除以 5,而无需使用 Slow div 操作码。以下是从 http:// /blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

3   2863311531
5   3435973837
7   3067833783
9   954437177
11  3123612579
13  3303820997
15  4008636143
17  4042322161

对于列表中没有的数字,您可能需要事先进行移位(除以 6,shr 1,然后乘以通过 3) 的倒数。

Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:

mov eax, some_number
mov ebx, 3435973837    // 32-bit inverse of 5
mul ebx

Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

3   2863311531
5   3435973837
7   3067833783
9   954437177
11  3123612579
13  3303820997
15  4008636143
17  4042322161

For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).

无所谓啦 2024-09-05 01:10:54

在 x64 上:

xor eax, eax 

for

xor rax, rax

(第一个也隐式清除 rax 的上半部分,但操作码较小)

on x64:

xor eax, eax 

for

xor rax, rax

(the first one also implicitly clears the upper half of rax, but has a smaller opcode)

寄意 2024-09-05 01:10:54

使用 LEA 进行乘法运算,例如:

lea eax, [ecx+ecx*4]   

对于 EAX = 5 * ECX

Using LEA for e.g. multiplication, like:

lea eax, [ecx+ecx*4]   

for EAX = 5 * ECX

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文