代码优化技巧:

发布于 2024-12-08 17:09:20 字数 2129 浏览 0 评论 0原文

我正在使用以下 ASM 例程对数组进行冒泡排序。我想知道我的代码效率低下:

.386
.model flat, c
option casemap:none


.code
            public sample
            sample PROC
            ;[ebp+0Ch]Length
            ;[ebp+08h]Array
                            push ebp
                            mov ebp, esp
                            push ecx
                            push edx
                            push esi
                            push eax
                            mov ecx,[ebp+0Ch]
                            mov esi,[ebp+08h]
                _bubbleSort:
                            push ecx
                            push esi
                            cmp ecx,1
                            je _exitLoop
                            sub ecx,01h
                            _miniLoop:
                                        push ecx
                                        mov edx,DWORD PTR [esi+4]
                                        cmp DWORD PTR [esi],edx
                                        ja _swap
                                        jmp _continueLoop
                            _swap:      
                                        lodsd
                                        mov DWORD PTR [esi-4],edx
                                        xchg DWORD PTR [esi],eax    
                                        jmp _skipIncrementESI
                            _continueLoop:
                                        add esi,4
                            _skipIncrementESI:
                                        pop ecx
                                        loop _miniLoop 
                            _exitLoop:
                            pop esi
                            pop ecx 
                            loop _bubbleSort
                            pop eax
                            pop esi
                            pop edx
                            pop ecx
                            pop ebp
                            ret 
            sample ENDP
            END 

基本上我有两个循环,就像通常的冒泡排序算法一样。外循环的 ecx 值为 10,内循环的 ecx 值为 [ecx-1]。我已经尝试过该例程,它可以成功编译并运行,但我不确定它是否有效。

I am using the following ASM routine to bubble sort an array. I want to know of the inefficiencies of my code:

.386
.model flat, c
option casemap:none


.code
            public sample
            sample PROC
            ;[ebp+0Ch]Length
            ;[ebp+08h]Array
                            push ebp
                            mov ebp, esp
                            push ecx
                            push edx
                            push esi
                            push eax
                            mov ecx,[ebp+0Ch]
                            mov esi,[ebp+08h]
                _bubbleSort:
                            push ecx
                            push esi
                            cmp ecx,1
                            je _exitLoop
                            sub ecx,01h
                            _miniLoop:
                                        push ecx
                                        mov edx,DWORD PTR [esi+4]
                                        cmp DWORD PTR [esi],edx
                                        ja _swap
                                        jmp _continueLoop
                            _swap:      
                                        lodsd
                                        mov DWORD PTR [esi-4],edx
                                        xchg DWORD PTR [esi],eax    
                                        jmp _skipIncrementESI
                            _continueLoop:
                                        add esi,4
                            _skipIncrementESI:
                                        pop ecx
                                        loop _miniLoop 
                            _exitLoop:
                            pop esi
                            pop ecx 
                            loop _bubbleSort
                            pop eax
                            pop esi
                            pop edx
                            pop ecx
                            pop ebp
                            ret 
            sample ENDP
            END 

Basically I have two loops, as usual for the bubble sort algorithm. The value of ecx for the outer loop is 10, and for the inner loop it is [ecx-1]. I have tried the routine and it compiles and runs successfully, but I am not sure if it is efficient.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

预谋 2024-12-15 17:09:20

您可以采取多种措施来加快汇编代码的速度:

  • 不要执行诸如 ja label_1 之类的操作; jmp label_2。只需执行 jbe label_2 即可。

  • loop 是一条非常慢的指令。 dec ebx; jnz Loopstart 速度更快

  • 使用所有寄存器而不是重复推送/弹出 ecx 和 esi。也使用 ebx 和 edi。

  • jmp-targets 应该很好地对齐。在两个循环开始之前和 jbe

从 Intel 获取一份有关您的 cpu 的手册(您可以将其下载为 pdf),其中包含以下时间操作码,也许它还有其他提示。

There are several things you can do to speed up your assembly code:

  • don't do things like ja label_1 ; jmp label_2. Just do jbe label_2 instead.

  • loop is a very slow instruction. dec ebx; jnz loopstart is much faster

  • use all registers instead of repeatedly push/pop ecx and esi. Use ebx and edi too.

  • jmp-targets should be well aligned. Use align 4 before the two loop-starts and after the jbe

Get yourself a manual for your cpu from Intel (you can download it as pdf), it has the timings for the opcodes, maybe it has other hints too.

笔芯 2024-12-15 17:09:20

几个简单的技巧:

1)尽量减少条件跳转的次数,因为它们非常昂贵。如果可能的话展开。
2) 重新排序指令以最大程度地减少因数据依赖性而导致的停顿:

cmp DWORD PTR [esi],edx ;// takes some time to compute,
mov edx,DWORD PTR [esi+4] ; 
ja _swap ;// waits for results of cmp

3) 避免旧的复合指令(decjnz 对比 loop 更快,并且不绑定到 ecx 寄存器)

编写比优化 C 编译器生成的代码更快的汇编代码是相当困难的,因为您应该考虑很多因素:数据和指令缓存的大小、对齐方式、流水线、指令时序。您可以在此处找到一些有关此内容的优秀文档。我特别推荐第一本书:用 C++ 优化软件

Several simple tips:

1) Try to minimize the number of conditional jumps, because they are very expensive. Unroll if possible.
2) Reorder instructions to minimize stalls because of data depencency:

cmp DWORD PTR [esi],edx ;// takes some time to compute,
mov edx,DWORD PTR [esi+4] ; 
ja _swap ;// waits for results of cmp

3) Avoid old composite instructions (dec, jnz pair is faster than loop and is not bound to ecx register)

It would be quite difficult to write assembly code that is faster than the code generated by optimizing C compiler, because you should consider lots of factors: size of data and instruction caches, alignments, pipeline, instruction timings. You can find some good documentation about this here. I especially recommend the first book: Optimizing software in C++

各空 2024-12-15 17:09:20

如果我们不需要该指令的标志,则替换为“add esi,4”:

_continueLoop:
            lea esi,[esi+4]

Substitute for "add esi,4" if we do not need a flag for this instruction:

_continueLoop:
            lea esi,[esi+4]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文