先读然后条件写与写

发布于 2024-08-23 12:34:45 字数 365 浏览 13 评论 0原文

平均而言,哪个更快 - 检查该值,然后根据需要进行分配,还是简单地分配?或者,用 C++ 术语来说:

bool b;
if(b)
    b = false;

b = false;

假设 if() 条件为真,概率为 50%。答案很可能是高度依赖于架构的 - 请说出您的低级考虑因素。写入总是会弄脏缓存线 - 对吗?因此,通过避免写入,我们可以在 0.5 种情况下避免缓存刷新。但是足够智能的缓存可能会检测到微不足道的写入,并且本身不会变脏。但无条件写入始终是 1 次内存操作,而读写平均是 1.5 次操作。

免责声明:这是一个好奇问题,而不是我实际面临的问题。

Which is, on average, faster - check the value then, if needed, assign, or simply assign? Or, in C++ terms:

bool b;
if(b)
    b = false;

or

b = false;

Assume that the if() condition is true with 50% probability. The answer will be, most likely, highly architecture dependent - please voice your low-level considerations. Writing always dirties the cache line - right? So by avoiding a write we avoid a cache flush in 0.5 cases. But a smart enough cache might detect a trivial write and not dirty itself. But the unconditional write is always exactly one memory operation, and read-write is, on average, 1.5 operations.

Disclaimer: this is a curiosity question, not a problem I actually face.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

长安忆 2024-08-30 12:34:45

分支在现代 CPU 上的成本很高,而内存访问在嵌入式/较旧的 CPU 上的成本也很高。因此,平面分配总是会更快,除非您有某种奇怪的内存,写入时间比读取时间长(提示:您不需要),

特别是由于以下原因,情况更糟:

  • 分支指令。处理器可以预测到这一点,但仍然可能会产生开销。
  • 2 次内存访问而不是 1 次。大多数形式的内存上的读取和写入速度相同,既然可以执行一次,为什么还要执行两次呢?
  • 更多代码开销。这是一个微型指令,但必须发出更多指令来执行 if 语句。因此意味着额外的内存读取和缓存中不必要的更多空间消耗。
  • 对于悲观的人来说,这可能意味着 C++ 编译器决定将此变量放入寄存器中,而不是其他更必要的变量。
  • 此外,如果您假设 b 被放入寄存器中。寄存器读/写非常便宜,但它们不是免费的。

Branches are expensive on modern CPUs and memory access is expensive on embedded/older CPUs. So the flat just-assign will always be faster unless you have some kinda weird memory that takes longer to write than read(hint: you don't)

It is worse for these reasons specifically:

  • A branching instruction. This may be predicted away by the processor, but it still incurs an overhead possibility.
  • 2 memory accesses instead of 1. Reading and Writing on most forms of memory are the same speed, so why do it twice when you can do it once?
  • More code overhead. this is a micro one, but more instructions must be emitted to do the if statement. So means an extra couple memory reads and more space unnecessarily consumed in the cache.
  • And for the pessimistic, it could mean that the C++ compiler decides to put this variable into a register instead of other more necessary variables..
  • Also, if you assume that b is put into a register. Register reads/writes are very cheap, but they aren't free..
醉态萌生 2024-08-30 12:34:45

为了获得实际结果,在不同的架构上进行分析绝对是值得的。

It would definitely be worth profiling this on different architectures to get actual results.

落在眉间の轻吻 2024-08-30 12:34:45

这取决于多种因素:

  • 分支的可预测性如何(在第一种情况下)
  • b 是否已经在寄存器中
  • 您正在使用什么架构

It depends on various things:

  • how predictable the branch is (in the first scenario)
  • whether b is already in a register
  • what architecture you are using
孤千羽 2024-08-30 12:34:45

除了分析建议之外,它实际上还取决于备份该写入请求的内存 - 例如,如果它是内存映射闪存设备,则写入可能会非常昂贵。

In addition to suggestions to profile, it also really depends on what memory is backing up that write request - if it's a memory-mapped flash device, for example, the write might be extremely costly.

私野 2024-08-30 12:34:45

最近,我一直在阅读有关快速压缩技术的论文,其中有人强调需要避免 if 分支以实现最佳性能。其原因在于CPU 流水线。使用 if 会破坏 CPU 并行执行部分代码时可以进行的许多优化。因此,如果您有很多此类操作,那么使用 b = false 可能会更快。

Recently I have been reading papers on very fast compression techniques and guys stressed there the need to avoid if branching to achieve the best performance. The reason for it is the CPU pipelining. Using ifs breaks many of optimizations a CPU can make to execute parts of code in parallel. So, if you had a lot of this operations, it might be faster to use b = false.

云淡月浅 2024-08-30 12:34:45

在现代流水线处理器上,您需要考虑到这一点:

  • 错误预测的分支会花费大量
  • 存储和加载时间,
  • 缓存可能会加快读取和写入速度,但如果它是多缓存架构并且b 正在多个缓存中被修改,多次写入可能意味着多次缓存逐出,并且可能会抵消缓存的性能。

带有条件写入的读取至少有一次内存访问和一个可能会错误预测的分支。假设分支占 50% 的时间,则平均有 1.5 次内存访问,再加上错误预测的可能性。

无条件写入恰好有一次内存访问并且没有任何分支。

现在,您需要平衡错误预测的成本与存储的成本,该成本根据您拥有的缓存代理数量而变化。

On a modern pipelined processor you need to take this into account:

  • a mispredicted branch costs a lot
  • stores and loads take a long time
  • caches may speed up both reads and writes, but if it's a multi-cache architecture and b is being modified in more than one cache, multiple writes may mean multiple cache evictions and may offset the performance of the cache.

Read with conditional write has at least one memory access and a branch that may mispredict. Assuming the branch is taken 50% of the time, you have 1.5 memory accesses on average, plus the chance of mispredicting.

Unconditional write has exactly one memory access and no branch whatsoever.

Now you need to balance the cost of mispredicting with the cost of a store, which changes depending on how many cache agents you have.

纵性 2024-08-30 12:34:45

如果您正在进行指针、引用或基本值类型的分配,我个人认为直接分配会更快(热衷于在探查器上查看结果)。在 50% 概率的环境中,您可能会执行更多将值放入寄存器的指令。分配触发赋值运算符的结构或类对象将是最昂贵的。条件逻辑还引入了更多指令,并增加了代码复杂性指标

If you are doing assignment of pointer, reference or basic value type I personally think the direct assignment will be faster (keen to see the outcome on profiler). In 50% probability environment, you will potential execute a lot more instructions that putting value into register. Assigning struct or class object which trigger assignment operator will be the most expensive. Conditional logic also introduces more instructions and it add to the code complexity metrics

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文