为什么 std::sin() 和 std::cos() 比 sin() 和 cos() 慢？

发布于 2024-11-28 21:47:17 字数 2406 浏览 3 评论 0原文

测试代码：

#include <cmath>
#include <cstdio>

const int N = 4096;
const float PI = 3.1415926535897932384626;

float cosine[N][N];
float sine[N][N];

int main() {
    printf("a\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            cosine[i][j] = cos(i*j*2*PI/N);
            sine[i][j] = sin(-i*j*2*PI/N);
        }
    }
    printf("b\n");
}

这是时间：

$ g++ main.cc -o main
$ time ./main
a
b

real    0m1.406s
user    0m1.370s
sys     0m0.030s

添加 using namespace std; 后，时间为：

$ g++ main.cc -o main
$ time ./main
a
b

real    0m8.743s
user    0m8.680s
sys     0m0.030s

编译器：

$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

程序集：

Dump of assembler code for function sin@plt:                                    
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

Dump of assembler code for function std::sin(float):                            
0x0000000000400702 <+0>:     push   %rbp                                     
0x0000000000400703 <+1>:     mov    %rsp,%rbp                                
0x0000000000400706 <+4>:     sub    $0x10,%rsp                               
0x000000000040070a <+8>:     movss  %xmm0,-0x4(%rbp)                         
0x000000000040070f <+13>:    movss  -0x4(%rbp),%xmm0                         
0x0000000000400714 <+18>:    callq  0x400500 <sinf@plt>                      
0x0000000000400719 <+23>:    leaveq                                          
0x000000000040071a <+24>:    retq                                            
End of assembler dump.

Dump of assembler code for function sinf@plt:                                   
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

原文

Test code:

#include <cmath>
#include <cstdio>

const int N = 4096;
const float PI = 3.1415926535897932384626;

float cosine[N][N];
float sine[N][N];

int main() {
    printf("a\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            cosine[i][j] = cos(i*j*2*PI/N);
            sine[i][j] = sin(-i*j*2*PI/N);
        }
    }
    printf("b\n");
}

Here is the time:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m1.406s
user    0m1.370s
sys     0m0.030s

After adding using namespace std;, the time is:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m8.743s
user    0m8.680s
sys     0m0.030s

Compiler:

$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

Assembly:

Dump of assembler code for function sin@plt:                                    
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

Dump of assembler code for function std::sin(float):                            
0x0000000000400702 <+0>:     push   %rbp                                     
0x0000000000400703 <+1>:     mov    %rsp,%rbp                                
0x0000000000400706 <+4>:     sub    $0x10,%rsp                               
0x000000000040070a <+8>:     movss  %xmm0,-0x4(%rbp)                         
0x000000000040070f <+13>:    movss  -0x4(%rbp),%xmm0                         
0x0000000000400714 <+18>:    callq  0x400500 <sinf@plt>                      
0x0000000000400719 <+23>:    leaveq                                          
0x000000000040071a <+24>:    retq                                            
End of assembler dump.

Dump of assembler code for function sinf@plt:                                   
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

π浅易 2024-12-05 21:47:17

您正在使用不同的重载：

尝试

        double angle = i*j*2*PI/N;
        cosine[i][j] = cos(angle);
        sine[i][j] = sin(angle);

无论是否使用 using namespace std; 都应该执行相同的操作

You're using a different overload:

Try

        double angle = i*j*2*PI/N;
        cosine[i][j] = cos(angle);
        sine[i][j] = sin(angle);

it should perform the same with or without using namespace std;

回复收藏 0 原文

小红帽 2024-12-05 21:47:17

我猜测区别在于 std::sin() 对于 float 和 double 都有重载，而 sin() 只接受 double。在浮点型的 std::sin() 内部，可能会转换为双精度型，然后调用双精度型的 std::sin() ，然后将结果转换回浮点型，从而使其速度变慢。

回复收藏 0 原文

多情出卖 2024-12-05 21:47:17

我使用 clang 和 -O3 优化进行了一些测量，在 Intel Core i7 上运行。我发现：

成本相同
float 上的 std::sin 与 sinf std::sin 上的 code>double 与 sin 的成本相同。
double 上的 sin 函数比 float 上慢 2.5 倍（同样，运行在英特尔酷睿上i7）。

这是重现它的完整代码：

#include <chrono>
#include <cmath>
#include <iostream>

template<typename Clock>
struct Timer
{
    using rep = typename Clock::rep;
    using time_point = typename Clock::time_point;
    using resolution = typename Clock::duration;

    Timer(rep& duration) :
    duration(&duration) {
        startTime = Clock::now();
    }
    ~Timer() {
        using namespace std::chrono;
        *duration = duration_cast<resolution>(Clock::now() - startTime).count();
    }
private:

    time_point startTime;
    rep* duration;
};

template<typename T, typename F>
void testSin(F sin_func) {
  using namespace std;
  using namespace std::chrono;
  high_resolution_clock::rep duration = 0;
  T sum {};
  {
    Timer<high_resolution_clock> t(duration);
    for(int i=0; i<100000000; ++i) {
      sum += sin_func(static_cast<T>(i));
    }
  }
  cout << duration << endl;
  cout << "  " << sum << endl;
}

int main() {
  testSin<float> ([] (float  v) { return std::sin(v); });
  testSin<float> ([] (float  v) { return sinf(v); });
  testSin<double>([] (double v) { return std::sin(v); });
  testSin<double>([] (double v) { return sin(v); });
  return 0;
}

如果人们可以在其架构结果的评论中报告，特别是关于 float 与 double 时间，我会很感兴趣。

I did some measurements using clang with -O3 optimization, running on an Intel Core i7. I found that:

std::sin on float has the same cost as sinf
std::sin on double has the same cost as sin
The sin functions on double are 2.5x slower than on float (again, running on an Intel Core i7).

Here is the full code to reproduce it:

#include <chrono>
#include <cmath>
#include <iostream>

template<typename Clock>
struct Timer
{
    using rep = typename Clock::rep;
    using time_point = typename Clock::time_point;
    using resolution = typename Clock::duration;

    Timer(rep& duration) :
    duration(&duration) {
        startTime = Clock::now();
    }
    ~Timer() {
        using namespace std::chrono;
        *duration = duration_cast<resolution>(Clock::now() - startTime).count();
    }
private:

    time_point startTime;
    rep* duration;
};

template<typename T, typename F>
void testSin(F sin_func) {
  using namespace std;
  using namespace std::chrono;
  high_resolution_clock::rep duration = 0;
  T sum {};
  {
    Timer<high_resolution_clock> t(duration);
    for(int i=0; i<100000000; ++i) {
      sum += sin_func(static_cast<T>(i));
    }
  }
  cout << duration << endl;
  cout << "  " << sum << endl;
}

int main() {
  testSin<float> ([] (float  v) { return std::sin(v); });
  testSin<float> ([] (float  v) { return sinf(v); });
  testSin<double>([] (double v) { return std::sin(v); });
  testSin<double>([] (double v) { return sin(v); });
  return 0;
}

I'd be interested if people could report, in the comments on the results on their architectures, especially regarding float vs. double time.

回复收藏 0 原文