使用静态变量与自动变量不会影响运行时性能
海湾合作委员会使我对它奇怪的优化感到困惑。下面两个功能的执行速度(calculate_with_static_vars
和calculate_with_stack_vars
)没有任何有意义的区别。
这是MRE代码:
#include <iostream>
#include <cstddef>
#include <cmath>
#include <chrono>
// just a simple timer, DON't PAY ATTENTION TO THIS
struct ScopedTimer
{
const std::chrono::time_point< std::chrono::steady_clock > start { std::chrono::steady_clock::now( ) };
std::chrono::time_point< std::chrono::steady_clock > end;
ScopedTimer( ) = default;
~ScopedTimer( )
{
end = std::chrono::steady_clock::now( );
std::clog << "\nTimer took "
<< std::chrono::duration< double, std::milli>( end - start ).count( )
<< " ms\n";
}
ScopedTimer( const ScopedTimer& ) = delete;
ScopedTimer& operator=( const ScopedTimer& ) = delete;
};
// this is the custom struct
struct Point3D
{
float x, y, z;
};
// the candidate 1
float calculate_with_static_vars( const Point3D point5 )
{
static constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // static vars
static constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
static constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
static constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };
const auto dist1 { std::hypot( point1.x - point2.x,
point1.y - point2.y,
point1.z - point2.z ) };
const auto dist2 { std::hypot( point2.x - point3.x,
point2.y - point3.y,
point2.z - point3.z ) };
const auto dist3 { std::hypot( point3.x - point4.x,
point3.y - point4.y,
point3.z - point4.z ) };
const auto dist4 { std::hypot( point4.x - point5.x,
point4.y - point5.y,
point4.z - point5.z ) };
return dist1 + dist2 + dist3 + dist4;
}
// the candidate 2
float calculate_with_stack_vars( const Point3D point5 )
{
constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // stack vars
constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };
const auto dist1 { std::hypot( point1.x - point2.x,
point1.y - point2.y,
point1.z - point2.z ) };
const auto dist2 { std::hypot( point2.x - point3.x,
point2.y - point3.y,
point2.z - point3.z ) };
const auto dist3 { std::hypot( point3.x - point4.x,
point3.y - point4.y,
point3.z - point4.z ) };
const auto dist4 { std::hypot( point4.x - point5.x,
point4.y - point5.y,
point4.z - point5.z ) };
return dist1 + dist2 + dist3 + dist4;
}
// a function that decides which of the above functions to call based on the branch_flag
inline float testFunc( const bool branch_flag, const bool arg_flag )
{
bool isStatic { branch_flag };
Point3D point2;
if ( arg_flag ) { point2 = { 3.5f, 7.33f, 9.04f }; }
else { point2 = { 2.5f, 6.33f, 8.04f }; }
float dist;
constexpr size_t numOfIterations { 1'000'000'000 };
if ( isStatic )
{
for ( size_t counter { }; counter < numOfIterations; ++counter )
{
dist = calculate_with_static_vars( point2 );
}
}
else
{
for ( size_t counter { }; counter < numOfIterations; ++counter )
{
dist = calculate_with_stack_vars( point2 );
}
}
return dist;
}
int main( )
{
bool branch_flag;
std::cin >> branch_flag;
bool arg_flag;
std::cin >> arg_flag;
float dist;
{
ScopedTimer timer;
dist = testFunc( branch_flag, arg_flag );
}
std::cout << "Sum of the distances of the four points: " << dist << '\n';
}
这两个功能正在执行相同的工作(计算4分之间的距离并返回其总和)他们拥有的唯一区别是,一个人使用静态变量同时,另一个使用静态变量(又称自动)。
用户有两个在控制台上输入两个布尔值(第一个是为了确定要运行的功能,而第二个功能对于确定要传递到所谓的函数的第二个不重要的一个。像这样:
true // runs the function with static vars
true // passes the first point to it
或
false // runs the function with automatic vars
true // passes the first point to it
然后在testfunc
内部的循环调用所选函数 10亿次次。
现在,人们可能会想知道为什么此代码中有这么多膨胀。原因是我想防止GCC进行积极的编译时间优化。否则,这将使这两个函数隐含consteval
,这将失败我的测试目的。
因此,问题是这些功能如何花费相同的时间运行(我的旧机器上约22秒)?静态版本不应该更快,因为它可以分配存储,然后仅初始化其变量一次?
GCC keeps me baffled by its strange optimizations. The execution speeds of the two functions below (calculate_with_static_vars
and calculate_with_stack_vars
) don't have any meaningful difference.
Here is the MRE code:
#include <iostream>
#include <cstddef>
#include <cmath>
#include <chrono>
// just a simple timer, DON't PAY ATTENTION TO THIS
struct ScopedTimer
{
const std::chrono::time_point< std::chrono::steady_clock > start { std::chrono::steady_clock::now( ) };
std::chrono::time_point< std::chrono::steady_clock > end;
ScopedTimer( ) = default;
~ScopedTimer( )
{
end = std::chrono::steady_clock::now( );
std::clog << "\nTimer took "
<< std::chrono::duration< double, std::milli>( end - start ).count( )
<< " ms\n";
}
ScopedTimer( const ScopedTimer& ) = delete;
ScopedTimer& operator=( const ScopedTimer& ) = delete;
};
// this is the custom struct
struct Point3D
{
float x, y, z;
};
// the candidate 1
float calculate_with_static_vars( const Point3D point5 )
{
static constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // static vars
static constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
static constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
static constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };
const auto dist1 { std::hypot( point1.x - point2.x,
point1.y - point2.y,
point1.z - point2.z ) };
const auto dist2 { std::hypot( point2.x - point3.x,
point2.y - point3.y,
point2.z - point3.z ) };
const auto dist3 { std::hypot( point3.x - point4.x,
point3.y - point4.y,
point3.z - point4.z ) };
const auto dist4 { std::hypot( point4.x - point5.x,
point4.y - point5.y,
point4.z - point5.z ) };
return dist1 + dist2 + dist3 + dist4;
}
// the candidate 2
float calculate_with_stack_vars( const Point3D point5 )
{
constexpr Point3D point1 { 1.5f, 4.83f, 2.01f }; // stack vars
constexpr Point3D point2 { 2.5f, 5.83f, 3.01f };
constexpr Point3D point3 { 3.5f, 6.83f, 4.01f };
constexpr Point3D point4 { 4.5f, 7.83f, 5.01f };
const auto dist1 { std::hypot( point1.x - point2.x,
point1.y - point2.y,
point1.z - point2.z ) };
const auto dist2 { std::hypot( point2.x - point3.x,
point2.y - point3.y,
point2.z - point3.z ) };
const auto dist3 { std::hypot( point3.x - point4.x,
point3.y - point4.y,
point3.z - point4.z ) };
const auto dist4 { std::hypot( point4.x - point5.x,
point4.y - point5.y,
point4.z - point5.z ) };
return dist1 + dist2 + dist3 + dist4;
}
// a function that decides which of the above functions to call based on the branch_flag
inline float testFunc( const bool branch_flag, const bool arg_flag )
{
bool isStatic { branch_flag };
Point3D point2;
if ( arg_flag ) { point2 = { 3.5f, 7.33f, 9.04f }; }
else { point2 = { 2.5f, 6.33f, 8.04f }; }
float dist;
constexpr size_t numOfIterations { 1'000'000'000 };
if ( isStatic )
{
for ( size_t counter { }; counter < numOfIterations; ++counter )
{
dist = calculate_with_static_vars( point2 );
}
}
else
{
for ( size_t counter { }; counter < numOfIterations; ++counter )
{
dist = calculate_with_stack_vars( point2 );
}
}
return dist;
}
int main( )
{
bool branch_flag;
std::cin >> branch_flag;
bool arg_flag;
std::cin >> arg_flag;
float dist;
{
ScopedTimer timer;
dist = testFunc( branch_flag, arg_flag );
}
std::cout << "Sum of the distances of the four points: " << dist << '\n';
}
The two functions are doing the same work (calculating the distances between 4 points and returning their sum) the only difference they have is that one uses static variables meanwhile the other one uses stack variables (a.k.a automatic).
The user has two enter two boolean values on the console (1st one is for deciding which function to run and the 2nd one which is not important is for deciding which argument to pass to the function being called). Like this:
true // runs the function with static vars
true // passes the first point to it
or
false // runs the function with automatic vars
true // passes the first point to it
And then the loop inside testFunc
calls the chosen function 1 billion times.
Now one might wonder why is there this much bloat in this code. The reason is that I wanted to prevent GCC from doing aggressive compile-time optimizations. Otherwise, it would make the two functions implicitly consteval
and that would defeat the purpose of my test.
So the question is how are these functions taking the same amount of time to run (~22 sec on my old machine)? Shouldn't the static version be considerably faster since it allocates storage and then initializes its variables only once?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
因为它们可以汇编为相同的组装。
否。变量是编译时间常数。实际上,编译器可以避免为它们提供任何存储。
通过恒定折叠的优化,这两个函数实际上等效于:
Because they can be compiled to identical assembly.
No. The variables are compile time constant. In practice, the compiler can avoid providing them any storage whatsoever.
With constant-folding optimisation, both functions are effectively equivalent to: