为什么 std::sin() 和 std::cos() 比 sin() 和 cos() 慢?
测试代码:
#include <cmath>
#include <cstdio>
const int N = 4096;
const float PI = 3.1415926535897932384626;
float cosine[N][N];
float sine[N][N];
int main() {
printf("a\n");
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
cosine[i][j] = cos(i*j*2*PI/N);
sine[i][j] = sin(-i*j*2*PI/N);
}
}
printf("b\n");
}
这是时间:
$ g++ main.cc -o main
$ time ./main
a
b
real 0m1.406s
user 0m1.370s
sys 0m0.030s
添加 using namespace std;
后,时间为:
$ g++ main.cc -o main
$ time ./main
a
b
real 0m8.743s
user 0m8.680s
sys 0m0.030s
编译器:
$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
程序集:
Dump of assembler code for function sin@plt:
0x0000000000400500 <+0>: jmpq *0x200b12(%rip) # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>: pushq $0x3
0x000000000040050b <+11>: jmpq 0x4004c0
End of assembler dump.
Dump of assembler code for function std::sin(float):
0x0000000000400702 <+0>: push %rbp
0x0000000000400703 <+1>: mov %rsp,%rbp
0x0000000000400706 <+4>: sub $0x10,%rsp
0x000000000040070a <+8>: movss %xmm0,-0x4(%rbp)
0x000000000040070f <+13>: movss -0x4(%rbp),%xmm0
0x0000000000400714 <+18>: callq 0x400500 <sinf@plt>
0x0000000000400719 <+23>: leaveq
0x000000000040071a <+24>: retq
End of assembler dump.
Dump of assembler code for function sinf@plt:
0x0000000000400500 <+0>: jmpq *0x200b12(%rip) # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>: pushq $0x3
0x000000000040050b <+11>: jmpq 0x4004c0
End of assembler dump.
Test code:
#include <cmath>
#include <cstdio>
const int N = 4096;
const float PI = 3.1415926535897932384626;
float cosine[N][N];
float sine[N][N];
int main() {
printf("a\n");
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
cosine[i][j] = cos(i*j*2*PI/N);
sine[i][j] = sin(-i*j*2*PI/N);
}
}
printf("b\n");
}
Here is the time:
$ g++ main.cc -o main
$ time ./main
a
b
real 0m1.406s
user 0m1.370s
sys 0m0.030s
After adding using namespace std;
, the time is:
$ g++ main.cc -o main
$ time ./main
a
b
real 0m8.743s
user 0m8.680s
sys 0m0.030s
Compiler:
$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
Assembly:
Dump of assembler code for function sin@plt:
0x0000000000400500 <+0>: jmpq *0x200b12(%rip) # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>: pushq $0x3
0x000000000040050b <+11>: jmpq 0x4004c0
End of assembler dump.
Dump of assembler code for function std::sin(float):
0x0000000000400702 <+0>: push %rbp
0x0000000000400703 <+1>: mov %rsp,%rbp
0x0000000000400706 <+4>: sub $0x10,%rsp
0x000000000040070a <+8>: movss %xmm0,-0x4(%rbp)
0x000000000040070f <+13>: movss -0x4(%rbp),%xmm0
0x0000000000400714 <+18>: callq 0x400500 <sinf@plt>
0x0000000000400719 <+23>: leaveq
0x000000000040071a <+24>: retq
End of assembler dump.
Dump of assembler code for function sinf@plt:
0x0000000000400500 <+0>: jmpq *0x200b12(%rip) # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>: pushq $0x3
0x000000000040050b <+11>: jmpq 0x4004c0
End of assembler dump.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您正在使用不同的重载:
尝试
无论是否使用
using namespace std;
都应该执行相同的操作You're using a different overload:
Try
it should perform the same with or without
using namespace std;
我猜测区别在于 std::sin() 对于 float 和 double 都有重载,而 sin() 只接受 double。在浮点型的 std::sin() 内部,可能会转换为双精度型,然后调用双精度型的 std::sin() ,然后将结果转换回浮点型,从而使其速度变慢。
I guess the difference is that there are overloads for std::sin() for float and for double, while sin() only takes double. Inside std::sin() for floats, there may be a conversion to double, then a call to std::sin() for doubles, and then a conversion of the result back to float, making it slower.
我使用 clang 和
-O3
优化进行了一些测量,在Intel Core i7
上运行。我发现:float
上的std::sin
与sinf
std::sin
上的 code>double 与sin
的成本相同。double
上的 sin 函数比float
上慢 2.5 倍(同样,运行在英特尔酷睿上i7)。这是重现它的完整代码:
如果人们可以在其架构结果的评论中报告,特别是关于
float
与double
时间,我会很感兴趣。I did some measurements using clang with
-O3
optimization, running on anIntel Core i7
. I found that:std::sin
onfloat
has the same cost assinf
std::sin
ondouble
has the same cost assin
double
are 2.5x slower than onfloat
(again, running on anIntel Core i7
).Here is the full code to reproduce it:
I'd be interested if people could report, in the comments on the results on their architectures, especially regarding
float
vs.double
time.在编译器命令行中使用 -S 标志并检查汇编器输出之间的差异。也许
using namespace std;
在可执行文件中提供了很多未使用的内容。Use -S flag in compiler command line and check the difference between assembler output. Maybe
using namespace std;
gives a lot of unused stuff in executable file.