C＆＃x2B;＆＃x2B;为什么使用全球变量处理比本地变量快？

发布于 2025-02-13 10:55:49 字数 1703 浏览 1 评论 0原文

我制作了一个带有愚蠢处理的C ++代码，只是为了测试一些缓存优化以研究代码改进并逐渐介入非常奇怪的东西...

将数组用作静态或在主函数之外声明，因为全局代码在0.5秒内运行（平均）而且，如果我只是将阵列移至主函数的内部，则相同的处理在15秒内（平均）运行。我找不到原因，我只是发现文章谈论本地变量的速度比当地人更快。

有人知道发生了什么事吗？我正在使用MINGW安装的Windows中使用C ++编译，并在带有i3-7100的桌面上运行代码。

编辑：

目标不是提高速度，只是研究缓存，移动，删除或合并数组的使用的一些测试。实际上，优化标志将代码更改为完美的代码和完美的速度。但是什么在变化？标志固定的阵列位置有什么问题？
我只是用G ++ -O

编辑编辑2：

我将数组初始化如建议的一些注释一样，实际上不是初始化的速度较慢，并且初始化的速度与全局代码的速度相同。为什么？引擎盖下发生了什么？

编辑3：

在评论和解释中提出了一些建议之后，我通过@paddy创建的公平测试环境进行了一些测试，并在评论中共享： https://godbolt.org/z/85qjp5

代码：

#include <chrono>
#include <iostream>

#define TAM 10
#define N 10000

#ifdef USE_GLOBAL
volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif

int main()
{   
#ifndef USE_GLOBAL
    volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif

    std::cout << "Starting" << std::endl;
    auto t1 = std::chrono::high_resolution_clock::now();
    {
        for (int total = 0; total < TAM; total++) {
            for (int i = 0; i < N; i++) {
                for (int j = 0; j < N; j++) {
                    output[i] += (values[j] + error[j]) / i + 1;
                }
            }
        }
    }

    auto t2 = std::chrono::high_resolution_clock::now();

    auto duration =
        (std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1)
             .count());

    float time = (float)duration / 1000000;

    std::cout << "Processing time = " << time << " seconds."
              << std::endl;
}

原文

I made a C++ code with dumb processing just to test a few cache optimizations to study code improvements and stepped into something very weird...

Using the arrays as static or declaring outside the main function as global the code runs in 0.5 seconds (average) and if I just move the arrays to inside of the main function, the same processing runs in 15 seconds (average). I can't find why, and I'm just finding articles talking about how local variables are faster than locals.

Does someone have any idea what's happening?
I'm compiling with C++ in Windows, installed using the MingW, and running the code on a Desktop with i3-7100.

EDIT:

The goal is not the speed improvements, is just some tests to study the use of the cache, moving, removing or merge arrays. The optimization flags are changing the code to the perfect code and perfect speed in fact. But whats it's changing? Whats was wrong in the arrays location that is being fixed by the flags?
I'm just compiling with g++ -o

EDIT 2:

I made the array initializations like some comments suggested and the not initialized is in fact the slower, and the initialized is the same speed of the global code.
Why? What's happenning under the hood?

EDIT 3:

After some suggestions in the comments and explanations I made some testing with a fair test enviroment created by @paddy and shared in the comments: https://godbolt.org/z/8ev85qjP5

Code:

#include <chrono>
#include <iostream>

#define TAM 10
#define N 10000

#ifdef USE_GLOBAL
volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif

int main()
{   
#ifndef USE_GLOBAL
    volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif

    std::cout << "Starting" << std::endl;
    auto t1 = std::chrono::high_resolution_clock::now();
    {
        for (int total = 0; total < TAM; total++) {
            for (int i = 0; i < N; i++) {
                for (int j = 0; j < N; j++) {
                    output[i] += (values[j] + error[j]) / i + 1;
                }
            }
        }
    }

    auto t2 = std::chrono::high_resolution_clock::now();

    auto duration =
        (std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1)
             .count());

    float time = (float)duration / 1000000;

    std::cout << "Processing time = " << time << " seconds."
              << std::endl;
}

分享到QQ

分享到微博