使用静态变量与自动变量不会影响运行时性能

发布于 2025-01-23 07:36:44 字数 4867 浏览 0 评论 0原文

海湾合作委员会使我对它奇怪的优化感到困惑。下面两个功能的执行速度（calculate_with_static_vars和calculate_with_stack_vars）没有任何有意义的区别。

这是MRE代码：

#include <iostream>
#include <cstddef>
#include <cmath>
#include <chrono>


// just a simple timer, DON't PAY ATTENTION TO THIS
struct ScopedTimer
{
    const std::chrono::time_point< std::chrono::steady_clock > start { std::chrono::steady_clock::now( ) };
          std::chrono::time_point< std::chrono::steady_clock > end;

    ScopedTimer( ) = default;
    ~ScopedTimer( )
    {
        end = std::chrono::steady_clock::now( );
        std::clog << "\nTimer took "
                  << std::chrono::duration< double, std::milli>( end - start ).count( )
                  << " ms\n";
    }
    ScopedTimer( const ScopedTimer& ) = delete;
    ScopedTimer& operator=( const ScopedTimer& ) = delete;
};

// this is the custom struct
struct Point3D
{
    float x, y, z;
};

// the candidate 1
float calculate_with_static_vars( const Point3D point5 )
{
    static constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // static vars
    static constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
    static constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
    static constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };

    const auto dist1 { std::hypot( point1.x - point2.x,
                                   point1.y - point2.y,
                                   point1.z - point2.z ) };

    const auto dist2 { std::hypot( point2.x - point3.x,
                                   point2.y - point3.y,
                                   point2.z - point3.z ) };

    const auto dist3 { std::hypot( point3.x - point4.x,
                                   point3.y - point4.y,
                                   point3.z - point4.z ) };

    const auto dist4 { std::hypot( point4.x - point5.x,
                                   point4.y - point5.y,
                                   point4.z - point5.z ) };

    return dist1 + dist2 + dist3 + dist4;
}

// the candidate 2
float calculate_with_stack_vars( const Point3D point5 )
{
    constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // stack vars
    constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
    constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
    constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };

    const auto dist1 { std::hypot( point1.x - point2.x,
                                   point1.y - point2.y,
                                   point1.z - point2.z ) };

    const auto dist2 { std::hypot( point2.x - point3.x,
                                   point2.y - point3.y,
                                   point2.z - point3.z ) };

    const auto dist3 { std::hypot( point3.x - point4.x,
                                   point3.y - point4.y,
                                   point3.z - point4.z ) };

    const auto dist4 { std::hypot( point4.x - point5.x,
                                   point4.y - point5.y,
                                   point4.z - point5.z ) };

    return dist1 + dist2 + dist3 + dist4;
}

// a function that decides which of the above functions to call based on the branch_flag
inline float testFunc( const bool branch_flag, const bool arg_flag )
{
    bool isStatic { branch_flag };
    Point3D point2;
    if ( arg_flag ) { point2 = { 3.5f, 7.33f, 9.04f }; }
    else            { point2 = { 2.5f, 6.33f, 8.04f }; }

    float dist;
    constexpr size_t numOfIterations { 1'000'000'000 };

    if ( isStatic )
    {
        for ( size_t counter { }; counter < numOfIterations; ++counter )
        {
            dist = calculate_with_static_vars( point2 );
        }
    }
    else
    {
        for ( size_t counter { }; counter < numOfIterations; ++counter )
        {
            dist = calculate_with_stack_vars( point2 );
        }
    }

    return dist;
}


int main( )
{
    bool branch_flag;
    std::cin >> branch_flag;
    bool arg_flag;
    std::cin >> arg_flag;

    float dist;
    {
    ScopedTimer timer;
    dist = testFunc( branch_flag, arg_flag );
    }

    std::cout << "Sum of the distances of the four points: " << dist << '\n';
}

这两个功能正在执行相同的工作（计算4分之间的距离并返回其总和）他们拥有的唯一区别是，一个人使用静态变量同时，另一个使用静态变量（又称自动）。

用户有两个在控制台上输入两个布尔值（第一个是为了确定要运行的功能，而第二个功能对于确定要传递到所谓的函数的第二个不重要的一个。像这样：

true    // runs the function with static vars
true    // passes the first point to it

或

false   // runs the function with automatic vars
true    // passes the first point to it

然后在testfunc内部的循环调用所选函数 10亿次次。

现在，人们可能会想知道为什么此代码中有这么多膨胀。原因是我想防止GCC进行积极的编译时间优化。否则，这将使这两个函数隐含consteval，这将失败我的测试目的。

因此，问题是这些功能如何花费相同的时间运行（我的旧机器上约22秒）？静态版本不应该更快，因为它可以分配存储，然后仅初始化其变量一次？

原文

GCC keeps me baffled by its strange optimizations. The execution speeds of the two functions below (calculate_with_static_vars and calculate_with_stack_vars) don't have any meaningful difference.

Here is the MRE code:

#include <iostream>
#include <cstddef>
#include <cmath>
#include <chrono>


// just a simple timer, DON't PAY ATTENTION TO THIS
struct ScopedTimer
{
    const std::chrono::time_point< std::chrono::steady_clock > start { std::chrono::steady_clock::now( ) };
          std::chrono::time_point< std::chrono::steady_clock > end;

    ScopedTimer( ) = default;
    ~ScopedTimer( )
    {
        end = std::chrono::steady_clock::now( );
        std::clog << "\nTimer took "
                  << std::chrono::duration< double, std::milli>( end - start ).count( )
                  << " ms\n";
    }
    ScopedTimer( const ScopedTimer& ) = delete;
    ScopedTimer& operator=( const ScopedTimer& ) = delete;
};

// this is the custom struct
struct Point3D
{
    float x, y, z;
};

// the candidate 1
float calculate_with_static_vars( const Point3D point5 )
{
    static constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // static vars
    static constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
    static constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
    static constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };

    const auto dist1 { std::hypot( point1.x - point2.x,
                                   point1.y - point2.y,
                                   point1.z - point2.z ) };

    const auto dist2 { std::hypot( point2.x - point3.x,
                                   point2.y - point3.y,
                                   point2.z - point3.z ) };

    const auto dist3 { std::hypot( point3.x - point4.x,
                                   point3.y - point4.y,
                                   point3.z - point4.z ) };

    const auto dist4 { std::hypot( point4.x - point5.x,
                                   point4.y - point5.y,
                                   point4.z - point5.z ) };

    return dist1 + dist2 + dist3 + dist4;
}

// the candidate 2
float calculate_with_stack_vars( const Point3D point5 )
{
    constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // stack vars
    constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
    constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
    constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };

    const auto dist1 { std::hypot( point1.x - point2.x,
                                   point1.y - point2.y,
                                   point1.z - point2.z ) };

    const auto dist2 { std::hypot( point2.x - point3.x,
                                   point2.y - point3.y,
                                   point2.z - point3.z ) };

    const auto dist3 { std::hypot( point3.x - point4.x,
                                   point3.y - point4.y,
                                   point3.z - point4.z ) };

    const auto dist4 { std::hypot( point4.x - point5.x,
                                   point4.y - point5.y,
                                   point4.z - point5.z ) };

    return dist1 + dist2 + dist3 + dist4;
}

// a function that decides which of the above functions to call based on the branch_flag
inline float testFunc( const bool branch_flag, const bool arg_flag )
{
    bool isStatic { branch_flag };
    Point3D point2;
    if ( arg_flag ) { point2 = { 3.5f, 7.33f, 9.04f }; }
    else            { point2 = { 2.5f, 6.33f, 8.04f }; }

    float dist;
    constexpr size_t numOfIterations { 1'000'000'000 };

    if ( isStatic )
    {
        for ( size_t counter { }; counter < numOfIterations; ++counter )
        {
            dist = calculate_with_static_vars( point2 );
        }
    }
    else
    {
        for ( size_t counter { }; counter < numOfIterations; ++counter )
        {
            dist = calculate_with_stack_vars( point2 );
        }
    }

    return dist;
}


int main( )
{
    bool branch_flag;
    std::cin >> branch_flag;
    bool arg_flag;
    std::cin >> arg_flag;

    float dist;
    {
    ScopedTimer timer;
    dist = testFunc( branch_flag, arg_flag );
    }

    std::cout << "Sum of the distances of the four points: " << dist << '\n';
}

The two functions are doing the same work (calculating the distances between 4 points and returning their sum) the only difference they have is that one uses static variables meanwhile the other one uses stack variables (a.k.a automatic).

The user has two enter two boolean values on the console (1st one is for deciding which function to run and the 2nd one which is not important is for deciding which argument to pass to the function being called). Like this:

true    // runs the function with static vars
true    // passes the first point to it

false   // runs the function with automatic vars
true    // passes the first point to it

And then the loop inside testFunc calls the chosen function 1 billion times.

Now one might wonder why is there this much bloat in this code. The reason is that I wanted to prevent GCC from doing aggressive compile-time optimizations. Otherwise, it would make the two functions implicitly consteval and that would defeat the purpose of my test.

So the question is how are these functions taking the same amount of time to run (~22 sec on my old machine)? Shouldn't the static version be considerably faster since it allocates storage and then initializes its variables only once?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星 2025-01-30 07:36:44

所以问题是这些功能如何花费相同的时间运行（我的旧机器上约22秒）？

因为它们可以汇编为相同的组装。

静态版本不应该更快，因为它可以分配存储，然后仅初始化其变量一次？

否。变量是编译时间常数。实际上，编译器可以避免为它们提供任何存储。

通过恒定折叠的优化，这两个函数实际上等效于：

return 5.19615269  // dist1 + dist2 + dist3
    + std::hypot(
          4.5f  - point5.x,
          7.83f - point5.y,
          5.01f - point5.z);

So the question is how are these functions taking the same amount of time to run (~22 sec on my old machine)?

Because they can be compiled to identical assembly.

Shouldn't the static version be considerably faster since it allocates storage and then initializes its variables only once?

No. The variables are compile time constant. In practice, the compiler can avoid providing them any storage whatsoever.

With constant-folding optimisation, both functions are effectively equivalent to:

return 5.19615269  // dist1 + dist2 + dist3
    + std::hypot(
          4.5f  - point5.x,
          7.83f - point5.y,
          5.01f - point5.z);

回复收藏 0 原文

~没有更多了~

关于作者

北斗星光

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用静态变量与自动变量不会影响运行时性能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

使用静态变量与自动变量不会影响运行时性能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。