为什么 std::sin() 和 std::cos() 比 sin() 和 cos() 慢?

发布于 2024-11-28 21:47:17 字数 2406 浏览 3 评论 0原文

测试代码:

#include <cmath>
#include <cstdio>

const int N = 4096;
const float PI = 3.1415926535897932384626;

float cosine[N][N];
float sine[N][N];

int main() {
    printf("a\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            cosine[i][j] = cos(i*j*2*PI/N);
            sine[i][j] = sin(-i*j*2*PI/N);
        }
    }
    printf("b\n");
}

这是时间:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m1.406s
user    0m1.370s
sys     0m0.030s

添加 using namespace std; 后,时间为:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m8.743s
user    0m8.680s
sys     0m0.030s

编译器:

$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

程序集:

Dump of assembler code for function sin@plt:                                    
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

Dump of assembler code for function std::sin(float):                            
0x0000000000400702 <+0>:     push   %rbp                                     
0x0000000000400703 <+1>:     mov    %rsp,%rbp                                
0x0000000000400706 <+4>:     sub    $0x10,%rsp                               
0x000000000040070a <+8>:     movss  %xmm0,-0x4(%rbp)                         
0x000000000040070f <+13>:    movss  -0x4(%rbp),%xmm0                         
0x0000000000400714 <+18>:    callq  0x400500 <sinf@plt>                      
0x0000000000400719 <+23>:    leaveq                                          
0x000000000040071a <+24>:    retq                                            
End of assembler dump.

Dump of assembler code for function sinf@plt:                                   
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

Test code:

#include <cmath>
#include <cstdio>

const int N = 4096;
const float PI = 3.1415926535897932384626;

float cosine[N][N];
float sine[N][N];

int main() {
    printf("a\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            cosine[i][j] = cos(i*j*2*PI/N);
            sine[i][j] = sin(-i*j*2*PI/N);
        }
    }
    printf("b\n");
}

Here is the time:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m1.406s
user    0m1.370s
sys     0m0.030s

After adding using namespace std;, the time is:

$ g++ main.cc -o main
$ time ./main
a
b

real    0m8.743s
user    0m8.680s
sys     0m0.030s

Compiler:

$ g++ --version
g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

Assembly:

Dump of assembler code for function sin@plt:                                    
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

Dump of assembler code for function std::sin(float):                            
0x0000000000400702 <+0>:     push   %rbp                                     
0x0000000000400703 <+1>:     mov    %rsp,%rbp                                
0x0000000000400706 <+4>:     sub    $0x10,%rsp                               
0x000000000040070a <+8>:     movss  %xmm0,-0x4(%rbp)                         
0x000000000040070f <+13>:    movss  -0x4(%rbp),%xmm0                         
0x0000000000400714 <+18>:    callq  0x400500 <sinf@plt>                      
0x0000000000400719 <+23>:    leaveq                                          
0x000000000040071a <+24>:    retq                                            
End of assembler dump.

Dump of assembler code for function sinf@plt:                                   
0x0000000000400500 <+0>:     jmpq   *0x200b12(%rip)        # 0x601018 <_GLOBAL_OFFSET_TABLE_+48>
0x0000000000400506 <+6>:     pushq  $0x3                                     
0x000000000040050b <+11>:    jmpq   0x4004c0                                 
End of assembler dump.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

π浅易 2024-12-05 21:47:17

您正在使用不同的重载:

尝试

        double angle = i*j*2*PI/N;
        cosine[i][j] = cos(angle);
        sine[i][j] = sin(angle);

无论是否使用 using namespace std; 都应该执行相同的操作

You're using a different overload:

Try

        double angle = i*j*2*PI/N;
        cosine[i][j] = cos(angle);
        sine[i][j] = sin(angle);

it should perform the same with or without using namespace std;

小红帽 2024-12-05 21:47:17

猜测区别在于 std::sin() 对于 float 和 double 都有重载,而 sin() 只接受 double。在浮点型的 std::sin() 内部,可能会转换为双精度型,然后调用双精度型的 std::sin() ,然后将结果转换回浮点型,从而使其速度变慢。

I guess the difference is that there are overloads for std::sin() for float and for double, while sin() only takes double. Inside std::sin() for floats, there may be a conversion to double, then a call to std::sin() for doubles, and then a conversion of the result back to float, making it slower.

多情出卖 2024-12-05 21:47:17

我使用 clang 和 -O3 优化进行了一些测量,在 Intel Core i7 上运行。我发现:

  • 成本相同
  • float 上的 std::sinsinf std::sin 上的 code>double 与 sin 的成本相同。
  • double 上的 sin 函数比 float 上慢 2.5 倍(同样,运行在英特尔酷睿上i7)。

这是重现它的完整代码:

#include <chrono>
#include <cmath>
#include <iostream>

template<typename Clock>
struct Timer
{
    using rep = typename Clock::rep;
    using time_point = typename Clock::time_point;
    using resolution = typename Clock::duration;

    Timer(rep& duration) :
    duration(&duration) {
        startTime = Clock::now();
    }
    ~Timer() {
        using namespace std::chrono;
        *duration = duration_cast<resolution>(Clock::now() - startTime).count();
    }
private:

    time_point startTime;
    rep* duration;
};

template<typename T, typename F>
void testSin(F sin_func) {
  using namespace std;
  using namespace std::chrono;
  high_resolution_clock::rep duration = 0;
  T sum {};
  {
    Timer<high_resolution_clock> t(duration);
    for(int i=0; i<100000000; ++i) {
      sum += sin_func(static_cast<T>(i));
    }
  }
  cout << duration << endl;
  cout << "  " << sum << endl;
}

int main() {
  testSin<float> ([] (float  v) { return std::sin(v); });
  testSin<float> ([] (float  v) { return sinf(v); });
  testSin<double>([] (double v) { return std::sin(v); });
  testSin<double>([] (double v) { return sin(v); });
  return 0;
}

如果人们可以在其架构结果的评论中报告,特别是关于 floatdouble 时间,我会很感兴趣。

I did some measurements using clang with -O3 optimization, running on an Intel Core i7. I found that:

  • std::sin on float has the same cost as sinf
  • std::sin on double has the same cost as sin
  • The sin functions on double are 2.5x slower than on float (again, running on an Intel Core i7).

Here is the full code to reproduce it:

#include <chrono>
#include <cmath>
#include <iostream>

template<typename Clock>
struct Timer
{
    using rep = typename Clock::rep;
    using time_point = typename Clock::time_point;
    using resolution = typename Clock::duration;

    Timer(rep& duration) :
    duration(&duration) {
        startTime = Clock::now();
    }
    ~Timer() {
        using namespace std::chrono;
        *duration = duration_cast<resolution>(Clock::now() - startTime).count();
    }
private:

    time_point startTime;
    rep* duration;
};

template<typename T, typename F>
void testSin(F sin_func) {
  using namespace std;
  using namespace std::chrono;
  high_resolution_clock::rep duration = 0;
  T sum {};
  {
    Timer<high_resolution_clock> t(duration);
    for(int i=0; i<100000000; ++i) {
      sum += sin_func(static_cast<T>(i));
    }
  }
  cout << duration << endl;
  cout << "  " << sum << endl;
}

int main() {
  testSin<float> ([] (float  v) { return std::sin(v); });
  testSin<float> ([] (float  v) { return sinf(v); });
  testSin<double>([] (double v) { return std::sin(v); });
  testSin<double>([] (double v) { return sin(v); });
  return 0;
}

I'd be interested if people could report, in the comments on the results on their architectures, especially regarding float vs. double time.

谜泪 2024-12-05 21:47:17

在编译器命令行中使用 -S 标志并检查汇编器输出之间的差异。也许 using namespace std; 在可执行文件中提供了很多未使用的内容。

Use -S flag in compiler command line and check the difference between assembler output. Maybe using namespace std; gives a lot of unused stuff in executable file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文