带辅助变量和不带辅助变量的变量交换 - 哪个更快?
我想你们都听说过“交换问题”; SO对此充满疑问。 不使用第三个变量的交换版本通常被认为更快,因为少了一个变量。我想知道幕后发生了什么,并编写了以下两个程序:
int main () {
int a = 9;
int b = 5;
int swap;
swap = a;
a = b;
b = swap;
return 0;
}
以及没有第三个变量的版本:
int main () {
int a = 9;
int b = 5;
a ^= b;
b ^= a;
a ^= b;
return 0;
}
clang 生成了汇编代码,并得到了第一个版本(使用第三个变量):
...
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl $0, %eax
movl $0, -4(%rbp)
movl $9, -8(%rbp)
movl $5, -12(%rbp)
movl -8(%rbp), %ecx
movl %ecx, -16(%rbp)
movl -12(%rbp), %ecx
movl %ecx, -8(%rbp)
movl -16(%rbp), %ecx
movl %ecx, -12(%rbp)
popq %rbp
ret
Leh_func_end0:
...
我使用 第二个版本(不使用第三个变量):
...
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl $0, %eax
movl $0, -4(%rbp)
movl $9, -8(%rbp)
movl $5, -12(%rbp)
movl -12(%rbp), %ecx
movl -8(%rbp), %edx
xorl %ecx, %edx
movl %edx, -8(%rbp)
movl -8(%rbp), %ecx
movl -12(%rbp), %edx
xorl %ecx, %edx
movl %edx, -12(%rbp)
movl -12(%rbp), %ecx
movl -8(%rbp), %edx
xorl %ecx, %edx
movl %edx, -8(%rbp)
popq %rbp
ret
Leh_func_end0:
...
第二个版本更长,但我对汇编代码不太了解,所以我不知道这是否意味着它更慢,所以我想听听以下的意见对此更了解的人。
以上版本的变量交换速度更快且占用的内存更少?
I guess you all heard of the 'swap problem'; SO is full of questions about it.
The version of the swap without use of a third variable is often considered to be faster since, well, you have one variable less. I wanted to know what was going on behind the curtains and wrote the following two programs:
int main () {
int a = 9;
int b = 5;
int swap;
swap = a;
a = b;
b = swap;
return 0;
}
and the version without third variable:
int main () {
int a = 9;
int b = 5;
a ^= b;
b ^= a;
a ^= b;
return 0;
}
I generated the assembly code using clang and got this for the first version (that uses a third variable):
...
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl $0, %eax
movl $0, -4(%rbp)
movl $9, -8(%rbp)
movl $5, -12(%rbp)
movl -8(%rbp), %ecx
movl %ecx, -16(%rbp)
movl -12(%rbp), %ecx
movl %ecx, -8(%rbp)
movl -16(%rbp), %ecx
movl %ecx, -12(%rbp)
popq %rbp
ret
Leh_func_end0:
...
and this for the second version (that does not use a third variable):
...
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl $0, %eax
movl $0, -4(%rbp)
movl $9, -8(%rbp)
movl $5, -12(%rbp)
movl -12(%rbp), %ecx
movl -8(%rbp), %edx
xorl %ecx, %edx
movl %edx, -8(%rbp)
movl -8(%rbp), %ecx
movl -12(%rbp), %edx
xorl %ecx, %edx
movl %edx, -12(%rbp)
movl -12(%rbp), %ecx
movl -8(%rbp), %edx
xorl %ecx, %edx
movl %edx, -8(%rbp)
popq %rbp
ret
Leh_func_end0:
...
The second one is longer but I don't know much about assembly code so I have no idea if that means that it is slower so I'd like to hear the opinion of someone more knowledgable about it.
Which of the above versions of a variable swap is faster and takes less memory?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看看一些优化的装配。从
gcc -O3 -std=c99 -S -o swapping.s swapping.c
生成的对我来说,
swap_temp
看起来尽可能高效。Look at some optimised assembly. From
gcc -O3 -std=c99 -S -o swapping.s swapping.c
producedTo me,
swap_temp
looks as efficient as can be.XOR 交换技巧的问题在于它是严格顺序的。它可能看起来很快,但实际上并非如此。有一条名为
XCHG
的指令可以交换两个寄存器,但由于其原子性质,这也可能比简单使用 3 个MOV
慢。使用 temp 的通用技术是一个很好的选择;)The problem with XOR swap trick is that it's strictly sequential. It may seem deceptively fast, but in reality, it is not. There's an instruction called
XCHG
that swaps two registers, but this can also be slower than simply using 3MOVs
, due to its atomic nature. The common technique with temp is an excellent choice ;)要了解成本,请想象每个命令都有要执行的成本,并且间接寻址也有其自己的成本。
该行需要类似时间单位的东西来访问 ecx 寄存器中的值,
一个时间单位用于访问 rbp,另一个时间单位用于应用偏移量 (-12) 和更多时间
单位(假设任意 3)用于将值从存储在 ecx 中的地址移动到
地址从 -12(%rbp) 开始指示。
如果计算每行和所有行中的所有操作,第二种方法肯定比第一种方法成本更高。
To get an idea of the cost imagine that every command has a cost to be performed and also the indirect addressing has its own cost.
This line will need something like a time unit for accessing the value in ecx register,
one time unit for accessing rbp, another one for applying the offset (-12) and more time
units (let's say arbitrarily 3) for moving the value from the address stored in ecx to the
address indicated from -12(%rbp).
If you count all the operations in every line and all line, the second method is for sure costlier than the first one.