当前位置：文江博客话题详情

全局变量性能影响(c, c++)

发布于 2024-10-20 06:52:43 字数 635 浏览 3 评论 0 原文

我目前正在开发一种非常快的算法，其中一部分是非常快的扫描仪和统计功能。在这个任务中，我追求任何性能优势。因此，我也对保持代码“多线程”友好感兴趣。

现在问题是：我注意到，将一些非常频繁访问的变量和数组放入“全局”或“静态本地”（效果相同）中，可以带来可衡量的性能优势（+10% 范围内）。我试图理解原因，并找到解决方案，因为我宁愿避免使用这些类型的分配。请注意，我不认为差异来自“分配”，因为在堆栈上分配一些变量和小数组几乎是瞬时的。我相信差异来自“访问”和“修改”数据。

在这次搜索中，我发现了 stackoverflow 上的这篇旧帖子：全局变量的 C++ 性能

但我对那里的答案感到非常失望。很少有解释，主要是抱怨“你不应该这样做”（嘿，这不是问题！）以及非常粗略的陈述，例如“它不会影响性能”，这显然是不正确的，因为我正在用精确的方法来测量它基准测试工具。

如上所述，我正在寻找一个解释，如果存在的话，我正在寻找这个问题的解决方案。到目前为止，我感觉计算本地（动态）变量的内存地址比全局（或本地静态）变量的内存地址要多一些。也许类似于 ADD 操作差异。但这无助于找到解决方案......

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

在你怀里撒娇 2024-10-27 06:52:43

这实际上取决于您的编译器、平台和其他细节。不过，我可以描述一种全局变量速度更快的场景。

在许多情况下，全局变量具有固定的偏移量。这允许生成的指令直接使用该地址。（类似于 MOV AX,[MyVar]。）

但是，如果您有一个相对于当前堆栈指针或类或数组成员的变量，则需要进行一些数学计算数组的地址并确定实际变量的地址。

显然，如果您需要在全局变量上放置某种互斥锁以保持其线程安全，那么您几乎肯定会失去任何性能增益。

回复收藏 0 原文

盗梦空间 2024-10-27 06:52:43

如果局部变量是 POD 类型，那么创建它们实际上是免费的。您可能会因太多堆栈变量或其他类似的基于对齐的原因而溢出缓存行，这些原因非常特定于您的代码段。我通常发现非局部变量会显着降低性能。

回复收藏 0 原文

﹎☆浅夏丿初晴 2024-10-27 06:52:43

就速度而言，静态分配很难被超越，虽然 10% 的差异非常小，但这可能是由于地址计算造成的。

但如果你追求的是速度
您在注释中的示例 while(p 显然是展开的候选者，例如：

static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
  stats[p[0]]++;
  stats[p[1]]++;
  stats[p[2]]++;
  stats[p[3]]++;
  stats[p[4]]++;
  stats[p[5]]++;
  stats[p[6]]++;
  stats[p[7]]++;
  p += 8;
}
while(p<pend) stats[*p++]++;

不要指望编译器为您做这件事。它可能或可能无法弄清楚。

我想到了其他可能的优化，但它们取决于您实际想要做什么。

It's hard to beat static allocation for speed, and while the 10% is a pretty small difference, it could be due to address calculation.

But if you're looking for speed,
your example in a comment while(p<end)stats[*p++]++; is an obvious candidate for unrolling, such as:

static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
  stats[p[0]]++;
  stats[p[1]]++;
  stats[p[2]]++;
  stats[p[3]]++;
  stats[p[4]]++;
  stats[p[5]]++;
  stats[p[6]]++;
  stats[p[7]]++;
  p += 8;
}
while(p<pend) stats[*p++]++;

Don't count on the compiler to do it for you. It might or might not be able to figure it out.

Other possible optimizations come to mind, but they depend on what you're actually trying to do.

回复收藏 0 原文

给我一枪 2024-10-27 06:52:43

如果您有类似的东西，那么

int stats[256]; while (p<end) stats[*p++]++;

static int stats[256]; while (p<end) stats[*p++]++;

您实际上并没有比较相同的东西，因为首先您没有对数组进行初始化。明确地写出第二行相当于

static int stats[256] = { 0 }; while (p<end) stats[*p++]++;

所以为了公平比较，您应该首先阅读

 int stats[256] = { 0 }; while (p<end) stats[*p++]++;

如果变量处于已知状态，您的编译器可能会推断出更多的东西。

那么，静态情况可能具有运行时优势，因为初始化是在编译时（或程序启动时）完成的。

要测试这是否弥补了差异，您应该使用静态声明和循环多次运行相同的函数，以查看如果调用次数增加，差异是否会消失。

但正如其他人已经说过的，最好是检查编译器生成的汇编器，看看生成的代码中有什么有效的差异。

If you have something like

int stats[256]; while (p<end) stats[*p++]++;

static int stats[256]; while (p<end) stats[*p++]++;

you are not really comparing the same thing because for the first instance you are not doing an initialization of your array. Written explicitly the second line is equivalent to

static int stats[256] = { 0 }; while (p<end) stats[*p++]++;

So to be a fair comparison you should have the first read

 int stats[256] = { 0 }; while (p<end) stats[*p++]++;

Your compiler might deduce much more things if he has the variables in a known state.

Now then, there could be runtime advantage of the static case, since the initialization is done at compile time (or program startup).

To test if this makes up for your difference you should run the same function with the static declaration and the loop several times, to see if the difference vanishes if your number of invocations grows.

But as other said already, best is to inspect the assembler that your compiler produces to see what effective difference there are in the code that is produced.

回复收藏 0 原文

~没有更多了~