状态检查总是一件有效的事情吗?

发布于 2024-12-11 23:49:55 字数 428 浏览 0 评论 0原文

  • 假设只需要将一个值绑定到某个 当 bState 为 true 时,某个对象的数据成员。当b状态 是假的,没有必要,但也不妨碍。

以下哪段代码会更有效,为什么?

(编辑:更新,状态现在是对象的成员)

const int x;     
int i;
int iToBind;
Classname pObject[x];

for (; i < x; ++i) {
 if (pObject[i].bState) {
        pObject[i].somedatamember = iToBind;
    }
}

与:

for (; i < x; ++i) {
   pObject[i].somedatamember = iToBind;
}
  • Suppose that it is only necessary to bind a value to a certain
    datamember of a certain object when bState is true. When bState
    is false, it is not necessary, but it does not hinder either.

Which of the following pieces of code would be more efficient, and why?

(EDIT: updated, state is now a member of the object)

const int x;     
int i;
int iToBind;
Classname pObject[x];

for (; i < x; ++i) {
 if (pObject[i].bState) {
        pObject[i].somedatamember = iToBind;
    }
}

Versus:

for (; i < x; ++i) {
   pObject[i].somedatamember = iToBind;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

煮茶煮酒煮时光 2024-12-18 23:49:55

我想说后者肯定更快。第一个版本具有双向内存访问,后者具有单向内存访问。

在此版本中:

for (; i < x; ++i) {
  if (pObject[x].bState) {
    pObject[x].somedatamember = iToBind;
  }
}

if 语句期间出现停顿,因为 CPU 必须等待从内存读取数据。读取内存的速度取决于数据所在的位置。距离 CPU 越远,所需时间越长:L1(最快)、L2、L3、Ram、Disk(最慢)。

在此版本中:

for (; i < x; ++i) {
  pObject[x].somedatamember = iToBind;
}

仅写入内存。写入内存不会使 CPU 停顿。

除了内存访问时间之外,后一个循环在循环内没有条件跳转。条件循环是一个很大的开销,特别是如果采取/不采取的决定实际上是随机的。

I would say the latter is definitely quicker. The first version has bidirectional memory access, the latter has unidirectional memory access.

In this version:

for (; i < x; ++i) {
  if (pObject[x].bState) {
    pObject[x].somedatamember = iToBind;
  }
}

there is a stall during the if statement as the CPU must wait for the data to be read from memory. The speed the memory is read is dependent on where the data is residing. The further from the CPU the longer it takes: L1 (fastest), L2, L3, Ram, Disk (slowest).

In this version:

for (; i < x; ++i) {
  pObject[x].somedatamember = iToBind;
}

there are only writes to memory. Writes to memory do not stall the CPU.

As well as the memory access times, the latter loop has no conditional jump inside the loop. Conditional loops are a significant overhead, especially if the taken/not taken decision is effectively random.

甜心小果奶 2024-12-18 23:49:55

这完全取决于您为帖子简化的内容。如果您只是为了跳过设置变量而添加分支,那么您可能不会获得任何东西,并且如果分支预测失败,您可能会失去任何东西。我会删除测试。

现在,如果要更新的对象不是简单的 int,那么...一如既往,测量、分析,然后根据实际情况而不是直觉做出决定。如果这不是紧密循环的一部分,那么您很可能不会注意到任何一种方式的差异。

It all depends on what you have simplified for the post. If you are adding a branch just to skip setting a variable, then you are probably not gaining anything and might be loosing if the branch prediction fails. I would remove the test.

Now, if the object to update is not a simple int then ... as always, measure, profile and then make a decision based on actual facts rather than hunches. If this is not part of a tight loop chances are that you will not even notice the difference either way.

汹涌人海 2024-12-18 23:49:55

您听说过循环不变代码运动吗?

它是编译器的优化过程,尽可能将代码移出循环体。

例如,给定以下代码:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  for (int i = 0; i < argc; ++i) {
    if (argc < 100) {
      printf("%d\n", atoi(argv[1]));
    }
  }
}

Clang 生成以下 IR:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
  %1 = icmp sgt i32 %argc, 0
  br i1 %1, label %.lr.ph, label %._crit_edge

.lr.ph:                                           ; preds = %0
  %2 = icmp slt i32 %argc, 100
  %3 = getelementptr inbounds i8** %argv, i64 1
  br i1 %2, label %4, label %._crit_edge

; <label>:4                                       ; preds = %4, %.lr.ph
  %i.01.us = phi i32 [ %9, %4 ], [ 0, %.lr.ph ]
  %5 = load i8** %3, align 8, !tbaa !0
  %6 = tail call i64 @strtol(i8* nocapture %5, i8** null, i32 10) nounwind
  %7 = trunc i64 %6 to i32
  %8 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %7) nounwind
  %9 = add nsw i32 %i.01.us, 1
  %exitcond = icmp eq i32 %9, %argc
  br i1 %exitcond, label %._crit_edge, label %4

._crit_edge:                                      ; preds = %4, %.lr.ph, %0
  ret i32 0
}

可以将其翻译回 C:

int main(int argc, char** argv) {
  if (argc == 0) { return 0; }

  if (argc >= 100) { return 0; }

  for (int i = 0; i < argc; ++i) {
    printf("%d\n", atoi(argv[1]));
  }

  return 0;
}

结论: 不要费心进行微观优化,除非探查器显示它们不像您那么微观想法。

编辑:

编辑从根本上改变了问题(天哪,我讨厌那个:p)。 LCM 不再适用,并且这两个函数具有截然不同的功能。

但结论仍然相同。 for 循环中的单个 if 检查不会改变代码的基本复杂性(请记住,循环条件也在每次迭代中进行测试......)。

Have you ever heard of Loop Invariant Code Motion ?

It is an optimization pass from compiler that moves code out of the body of loops whenever possible.

For example, given the following code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  for (int i = 0; i < argc; ++i) {
    if (argc < 100) {
      printf("%d\n", atoi(argv[1]));
    }
  }
}

Clang generates the following IR:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
  %1 = icmp sgt i32 %argc, 0
  br i1 %1, label %.lr.ph, label %._crit_edge

.lr.ph:                                           ; preds = %0
  %2 = icmp slt i32 %argc, 100
  %3 = getelementptr inbounds i8** %argv, i64 1
  br i1 %2, label %4, label %._crit_edge

; <label>:4                                       ; preds = %4, %.lr.ph
  %i.01.us = phi i32 [ %9, %4 ], [ 0, %.lr.ph ]
  %5 = load i8** %3, align 8, !tbaa !0
  %6 = tail call i64 @strtol(i8* nocapture %5, i8** null, i32 10) nounwind
  %7 = trunc i64 %6 to i32
  %8 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %7) nounwind
  %9 = add nsw i32 %i.01.us, 1
  %exitcond = icmp eq i32 %9, %argc
  br i1 %exitcond, label %._crit_edge, label %4

._crit_edge:                                      ; preds = %4, %.lr.ph, %0
  ret i32 0
}

Which can be translated back to C:

int main(int argc, char** argv) {
  if (argc == 0) { return 0; }

  if (argc >= 100) { return 0; }

  for (int i = 0; i < argc; ++i) {
    printf("%d\n", atoi(argv[1]));
  }

  return 0;
}

Conclusion: don't bother with micro-optimizations unless a profiler reveals they are not as micro as you thought.

EDIT:

The edit radically changed the question (god I hate that :p). LICM does not apply any longer and the two functions have widely different functionalities.

The conclusion however remains identical. A single if check within a for loop does not change the fundamental complexity of your code (remember that the loop condition is tested at each iteration too...).

生生不灭 2024-12-18 23:49:55

据我所知, bState 在第一个片段的循环中没有改变,因此您可以将 if 放在外面,这显然更有效。

For what I can tell bState isn't changed in the loop in the first fragment, so you can put if outside, which is obviously more efficient.

复古式 2024-12-18 23:49:55

我想说这确实取决于上下文。如果它对于
bState在绑定期间为 true,则每次循环迭代都必须支付额外的 1 或 2 个汇编指令来检查状态。如果没有,请忽略
如果
x特别大时。

I'd say it really depends on the context. If it is critical for
bStateto be true during the bind, then the extra 1 or 2 assembly instructions per loop iteration to check the state will have to be paid. If not, leave out the
if
when xis particularly large.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文