组装 - 交换功能 - 为什么它不起作用？

发布于 2024-12-21 01:16:44 字数 625 浏览 5 评论 0原文

我需要创建一个函数，将 &x 的值与 &y 的值交换（意思是交换 *(&y) 和 *(&x)。

Swap:

    push EBP
    mov EBP,ESP
    mov EBX, [EBP+12] ; ebx = *x
    mov EAX, DWORD [EBX] ;eax = ebx = *x
    mov DWORD [EBP-4], EAX ; [ebp-4] = eax =*x
    mov EDX, [EBP+8] ; edx = *y
    mov EAX, DWORD [EDX] ; eax = *edx = *y
    mov DWORD [EBX], EAX ; ebx = eax = *y
    mov EAX, DWORD [EBP-4] ; eax = *x
    mov DWORD [EDX], EAX ; edx = *x
    pop EBP ; ebx = *y and edx = *x
    ret

我这样称呼它：

    // call Swap
    push x
    push y
    call swap

我不明白为什么它不起作用。我添加了注释来解释我的实现有什么问题吗？

原文

I need to create a function that swaps the value of &x with the value of &y (meaning swap *(&y) and *(&x).

Swap:

    push EBP
    mov EBP,ESP
    mov EBX, [EBP+12] ; ebx = *x
    mov EAX, DWORD [EBX] ;eax = ebx = *x
    mov DWORD [EBP-4], EAX ; [ebp-4] = eax =*x
    mov EDX, [EBP+8] ; edx = *y
    mov EAX, DWORD [EDX] ; eax = *edx = *y
    mov DWORD [EBX], EAX ; ebx = eax = *y
    mov EAX, DWORD [EBP-4] ; eax = *x
    mov DWORD [EDX], EAX ; edx = *x
    pop EBP ; ebx = *y and edx = *x
    ret

I call it like this:

    // call Swap
    push x
    push y
    call swap

I don't understand why it's not working. I added comments that explain my understanding of it. What's wrong with my implementation? How can I fix it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柏拉图鍀咏恒 2024-12-28 01:16:44

当您访问 [EBP-4] 处的双字时，您实际上并没有在使用的堆栈上保留内存。它可能会被中断例程、信号处理程序、异步调用过程以及操作系统中适用的任何内容覆盖。

代码应该如下所示：

swap:
    push  EBP
    mov   EBP,ESP           ; make a traditional stack frame

    sub   ESP, 4         ; reserve memory for a local variable at [EBP-4]

    mov   EBX, [EBP+12]        ; ebx = &x
    mov   EAX, DWORD [EBX]     ; eax = x
    mov   DWORD [EBP-4], EAX   ; [ebp-4] = eax = x
    mov   EDX, [EBP+8]         ; edx = &y
    mov   EAX, DWORD [EDX]     ; eax = y
    mov   DWORD [EBX], EAX     ; *&x = y
    mov   EAX, DWORD [EBP-4]   ; eax = x reloaded from the local
    mov   DWORD [EDX], EAX     ; *&y = x

    leave          ; remove locals (by restoring ESP), restore EBP

    ret

另外，请确保将变量 x 和 y 的地址（而不是变量的值）作为参数传递。 push x+push y 将传递 NASM 中的 x 和 y 地址，但它们将传递 x 和 y 的值TASM 和 MASM 中的 code>x 和 y。

You don't actually reserve memory on the stack that you use when you access a dword at [EBP-4]. It can get overwritten by things like interrupt routines, signal handlers, asynchronously called procedures, whatever applies in your OS.

The code should be like this instead:

swap:
    push  EBP
    mov   EBP,ESP           ; make a traditional stack frame

    sub   ESP, 4         ; reserve memory for a local variable at [EBP-4]

    mov   EBX, [EBP+12]        ; ebx = &x
    mov   EAX, DWORD [EBX]     ; eax = x
    mov   DWORD [EBP-4], EAX   ; [ebp-4] = eax = x
    mov   EDX, [EBP+8]         ; edx = &y
    mov   EAX, DWORD [EDX]     ; eax = y
    mov   DWORD [EBX], EAX     ; *&x = y
    mov   EAX, DWORD [EBP-4]   ; eax = x reloaded from the local
    mov   DWORD [EDX], EAX     ; *&y = x

    leave          ; remove locals (by restoring ESP), restore EBP

    ret

Also, make sure that you're passing as parameters the addresses of the variables x and y, not the values of the variables. push x+push y will pass the addresses of x and y in NASM but they will pass values of x and y in TASM and MASM.

回复收藏 0 原文

清旖 2024-12-28 01:16:44

除了 Alexey 的错误修复之外，您还可以大大提高效率。（当然，内联交换并在调用站点进行优化更好。）

堆栈上不需要本地临时文件：您可以重新加载其中一个地址两次，或者保存/恢复 ESI 并将其用作一个临时的。

您实际上正在破坏 EBX，它在所有正常的 C 调用约定中都是调用保留的。在大多数 32 位 x86 调用约定中，EAX、ECX 和 EDX 是无需保存/恢复即可使用的三个调用破坏寄存器，而其他寄存器是调用保留的。（所以，即你的调用者希望你不要破坏它们的值，所以你只能在你放回原始值的情况下使用它们。这就是为什么在你将它用作帧指针后必须恢复 EBP 的原因。）

gcc 什么-O3 -m32 在编译交换函数的独立（非内联）定义时所做的是保存/恢复 EBX，因此它有 4 个寄存器可供使用。 clang选择ESI。

void swap(int *px, int *py) {
    int tmp = *px;
    *px = *py;
    *py = tmp;
}

<一href="https://gcc.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSAbAQwDtRkBSAJgCFufSAZ1QBXYskwgA5ADdUedAGpBAdyYAHCHhYFFAKn UAPUou26DATwCUijgHY%2BABgAiHRwEFFX0zsUEAtuq2AMzO%2BkYcwU6uHt7hhiFhlpHRbp7elol%2BgSluMe72MflSVozSAKxSpCzSjlWo0gDC/PzKouKYtlzBtFUEtSWlCJhM WMSUpQDWIMHBAHSzi0vLAGxlUgaAsVTVSdaQNUlWCII6k/bslpHCwSGiBeAyYZBQQd%2BoPTyDAABxcpABmDwIT2OEAARgNSGDtExiBZpL1SHd/JgdAB5FgMeEXUhYfysYCPSH4Yi YZAEPAyTDHHGYQxkkTAhFVMyYBiQgjEPD%2BZmlZhsFAtXiMPBg46QUqodQU1AsGkAWkMyEU8rRwRVAHUmAwGCr/MEuCr/lTiGDUIJMPKmIJ/Ed2hI6HyKttIQdDN8VvKVhtFMBk MrvnNDRBcIQSF0eiZGqh7o9iBHaDZmrx%2BH0BlYhiMxhNSNNZgtloXZmt2ZsqjzaI5Tjs9gcjiczumrjAoC2ILcY%2B84%2BRKG8PuNkPzgHYqwCgSDKBCcdCWLDsYjkaiCBisc TMPi2EScSSyRSqTS9nSGUypIjWaW9pzubzGATBSnhQxReKIJLpXhZQqlSq1Zrtbr5X1Lg7TEB1E3WSpqldaR3U9b1FEHAlFDsOZHFQxQQ3wIh424SNFGjWMngjLgkyFHg0wuDNcx MEYiyLdYtlIHlymraDDiEBtzjqUprkQFBO37HtXgEuMUCHEdTkBBhgWIUEpz2Gc52ZJEYxRdFMWxPY8QJLctLwUlyUpalISPZBGUkU8WR0NkOS5HllL5O9OafAQnzFeA3xloVpEV ZVVXVeUtR1ECOkdCCXRxN0PS9H1ELYZDUPQzCwxw7p6HwkSiNw4JSJcijuMzUZPlfBjyxAFioMi6R61OLjBmo/M6MLdZggi2tqsbSinSkYDKva9i6qok1BA/GoNiAA%3D%3D%3D" rel="nofollow noreferrer">在 Godbolt 编译器资源管理器上：

# gcc8.2 -O3 -m32 -fverbose-asm
# gcc itself emitted the comments on the following instructions
swap:
        push    ebx     #
        mov     edx, DWORD PTR [esp+8]    # px, px
        mov     eax, DWORD PTR [esp+12]   # py, py
        mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
        mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
        mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
        mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
        pop     ebx       #
        ret  

# DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
# you can just remove them all because the register implies an operand size

它还避免了创建遗留堆栈框架。如果需要，您可以将 -fno-omit-frame-pointer 添加到编译器选项中，以查看带有帧指针的代码生成。（Godbolt 将重新编译并向您显示 asm。非常方便的站点，用于探索编译器选项和代码更改。）

64 位调用约定已经在寄存器中包含参数，并且有足够的暂存寄存器，因此我们只得到 4 条指令，效率更高。

正如我提到的，另一种选择是重新加载其中一个指针参数两次：

swap:
       # without a push, offsets relative to ESP are smaller by 4
        mov     edx, [esp+4]    # edx = px   reused later
        mov     eax, [esp+8]    # eax = py   also reused later
        mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function

        mov     eax, [eax]      # eax = *py   destroying our register copy of py
        mov    [edx], eax       # *px = *py;  done with px, can now destroy it

        mov     edx, [esp+8]   # edx = py
        mov    [edx], ecx       # *py = tmp;
        ret

只有 7 条指令，而不是 8 条。两次加载相同的值非常便宜，并且乱序执行意味着它不是一个快速准备好存储地址是一个问题，即使按照程序顺序，只有存储之前的指令加载地址。

Aside from Alexey's bugfix, you could make this significantly more efficient. (Of course inlining the swap and optimizing at the call site is even better.)

There's no need for a local temporary on the stack: you could instead reload one of the addresses twice, or save/restore ESI and use it as a temporary.

You're actually destroying EBX, which is call-preserved in all the normal C calling conventions. In most 32-bit x86 calling conventions, EAX, ECX, and EDX are the three call-clobbered registers you can use without saving/restoring, while the others are call-preserved. (So i.e. your caller expects you not to destroy their values, so you can only use them if you put back the original value. This is why EBP has to be restored after you use it for a frame pointer.)

What gcc -O3 -m32 does when compiling a stand-alone (not inlined) definition for a swap function is save/restore EBX so it has 4 registers to play with. clang chooses ESI.

void swap(int *px, int *py) {
    int tmp = *px;
    *px = *py;
    *py = tmp;
}

On the Godbolt compiler explorer:

# gcc8.2 -O3 -m32 -fverbose-asm
# gcc itself emitted the comments on the following instructions
swap:
        push    ebx     #
        mov     edx, DWORD PTR [esp+8]    # px, px
        mov     eax, DWORD PTR [esp+12]   # py, py
        mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
        mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
        mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
        mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
        pop     ebx       #
        ret  

# DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
# you can just remove them all because the register implies an operand size

It also avoids making a legacy stack-frame. You can add -fno-omit-frame-pointer to the compiler options to see code-gen with a frame pointer, if you want. (Godbolt will recompile and show you the asm. Very handy site for exploring compiler options and code changes.)

64-bit calling conventions already have args in registers, and have enough scratch regs so we just get 4 instructions, much more efficient.

As I mentioned, another option is to reload one of the pointer args twice:

swap:
       # without a push, offsets relative to ESP are smaller by 4
        mov     edx, [esp+4]    # edx = px   reused later
        mov     eax, [esp+8]    # eax = py   also reused later
        mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function

        mov     eax, [eax]      # eax = *py   destroying our register copy of py
        mov    [edx], eax       # *px = *py;  done with px, can now destroy it

        mov     edx, [esp+8]   # edx = py
        mov    [edx], ecx       # *py = tmp;
        ret

Only 7 instructions instead of 8. Loading the same value twice is very cheap, and out-of-order execution means it's not a problem to have the store address ready quickly even though in program order it's only the instruction right before the store that loads the address.

回复收藏 0 原文

~没有更多了~