带辅助变量和不带辅助变量的变量交换 - 哪个更快?

发布于 2024-12-22 00:03:53 字数 1426 浏览 1 评论 0原文

我想你们都听说过“交换问题”; SO对此充满疑问。 不使用第三个变量的交换版本通常被认为更快,因为少了一个变量。我想知道幕后发生了什么,并编写了以下两个程序:

int main () {
    int a = 9;
    int b = 5;
    int swap;

    swap = a;
    a = b;
    b = swap;

    return 0;
}

以及没有第三个变量的版本:

int main () {
    int a = 9;
    int b = 5;

    a ^= b;
    b ^= a;
    a ^= b;

    return 0;
}

clang 生成了汇编代码,并得到了第一个版本(使用第三个变量):

...
Ltmp0:
    movq   %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -8(%rbp), %ecx
    movl   %ecx, -16(%rbp)
    movl   -12(%rbp), %ecx
    movl   %ecx, -8(%rbp)
    movl   -16(%rbp), %ecx
    movl   %ecx, -12(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

我使用 第二个版本(不使用第三个变量):

...
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    movl   -8(%rbp), %ecx
    movl   -12(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

第二个版本更长,但我对汇编代码不太了解,所以我不知道这是否意味着它更慢,所以我想听听以下的意见对此更了解的人。

以上版本的变量交换速度更快且占用的内存更少?

I guess you all heard of the 'swap problem'; SO is full of questions about it.
The version of the swap without use of a third variable is often considered to be faster since, well, you have one variable less. I wanted to know what was going on behind the curtains and wrote the following two programs:

int main () {
    int a = 9;
    int b = 5;
    int swap;

    swap = a;
    a = b;
    b = swap;

    return 0;
}

and the version without third variable:

int main () {
    int a = 9;
    int b = 5;

    a ^= b;
    b ^= a;
    a ^= b;

    return 0;
}

I generated the assembly code using clang and got this for the first version (that uses a third variable):

...
Ltmp0:
    movq   %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -8(%rbp), %ecx
    movl   %ecx, -16(%rbp)
    movl   -12(%rbp), %ecx
    movl   %ecx, -8(%rbp)
    movl   -16(%rbp), %ecx
    movl   %ecx, -12(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

and this for the second version (that does not use a third variable):

...
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movl   $0, %eax
    movl   $0, -4(%rbp)
    movl   $9, -8(%rbp)
    movl   $5, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    movl   -8(%rbp), %ecx
    movl   -12(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -12(%rbp)
    movl   -12(%rbp), %ecx
    movl   -8(%rbp), %edx
    xorl   %ecx, %edx
    movl   %edx, -8(%rbp)
    popq   %rbp
    ret
Leh_func_end0:
...

The second one is longer but I don't know much about assembly code so I have no idea if that means that it is slower so I'd like to hear the opinion of someone more knowledgable about it.

Which of the above versions of a variable swap is faster and takes less memory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小姐丶请自重 2024-12-29 00:03:53

看看一些优化的装配。从

void swap_temp(int *restrict a, int *restrict b){
    int temp = *a;
    *a = *b;
    *b = temp;
}

void swap_xor(int *restrict a, int *restrict b){
    *a ^= *b;
    *b ^= *a;
    *a ^= *b;
}

gcc -O3 -std=c99 -S -o swapping.s swapping.c 生成的

    .file   "swapping.c"
.text
.p2align 4,,15
.globl swap_temp
.type   swap_temp, @function
swap_temp:
.LFB0:
.cfi_startproc
movl    (%rdi), %eax
movl    (%rsi), %edx
movl    %edx, (%rdi)
movl    %eax, (%rsi)
ret
.cfi_endproc
.LFE0:
.size   swap_temp, .-swap_temp
.p2align 4,,15
.globl swap_xor
.type   swap_xor, @function
swap_xor:
.LFB1:
.cfi_startproc
movl    (%rsi), %edx
movl    (%rdi), %eax
xorl    %edx, %eax
xorl    %eax, %edx
xorl    %edx, %eax
movl    %edx, (%rsi)
movl    %eax, (%rdi)
ret
.cfi_endproc
.LFE1:
.size   swap_xor, .-swap_xor
.ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section    .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section    .note.GNU-stack,"",@progbits

对我来说,swap_temp 看起来尽可能高效。

Look at some optimised assembly. From

void swap_temp(int *restrict a, int *restrict b){
    int temp = *a;
    *a = *b;
    *b = temp;
}

void swap_xor(int *restrict a, int *restrict b){
    *a ^= *b;
    *b ^= *a;
    *a ^= *b;
}

gcc -O3 -std=c99 -S -o swapping.s swapping.c produced

    .file   "swapping.c"
.text
.p2align 4,,15
.globl swap_temp
.type   swap_temp, @function
swap_temp:
.LFB0:
.cfi_startproc
movl    (%rdi), %eax
movl    (%rsi), %edx
movl    %edx, (%rdi)
movl    %eax, (%rsi)
ret
.cfi_endproc
.LFE0:
.size   swap_temp, .-swap_temp
.p2align 4,,15
.globl swap_xor
.type   swap_xor, @function
swap_xor:
.LFB1:
.cfi_startproc
movl    (%rsi), %edx
movl    (%rdi), %eax
xorl    %edx, %eax
xorl    %eax, %edx
xorl    %edx, %eax
movl    %edx, (%rsi)
movl    %eax, (%rdi)
ret
.cfi_endproc
.LFE1:
.size   swap_xor, .-swap_xor
.ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section    .comment.SUSE.OPTs,"MS",@progbits,1
.string "Ospwg"
.section    .note.GNU-stack,"",@progbits

To me, swap_temp looks as efficient as can be.

〗斷ホ乔殘χμё〖 2024-12-29 00:03:53

XOR 交换技巧的问题在于它是严格顺序的。它可能看起来很快,但实际上并非如此。有一条名为 XCHG 的指令可以交换两个寄存器,但由于其原子性质,这也可能比简单使用 3 个 MOV 慢。使用 temp 的通用技术是一个很好的选择;)

The problem with XOR swap trick is that it's strictly sequential. It may seem deceptively fast, but in reality, it is not. There's an instruction called XCHG that swaps two registers, but this can also be slower than simply using 3 MOVs, due to its atomic nature. The common technique with temp is an excellent choice ;)

玩物 2024-12-29 00:03:53

要了解成本,请想象每个命令都有要执行的成本,并且间接寻址也有其自己的成本。

movl   -12(%rbp), %ecx

该行需要类似时间单位的东西来访问 ecx 寄存器中的值,
一个时间单位用于访问 rbp,另一个时间单位用于应用偏移量 (-12) 和更多时间
单位(假设任意 3)用于将值从存储在 ecx 中的地址移动到
地址从 -12(%rbp) 开始指示。

如果计算每行和所有行中的所有操作,第二种方法肯定比第一种方法成本更高。

To get an idea of the cost imagine that every command has a cost to be performed and also the indirect addressing has its own cost.

movl   -12(%rbp), %ecx

This line will need something like a time unit for accessing the value in ecx register,
one time unit for accessing rbp, another one for applying the offset (-12) and more time
units (let's say arbitrarily 3) for moving the value from the address stored in ecx to the
address indicated from -12(%rbp).

If you count all the operations in every line and all line, the second method is for sure costlier than the first one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文