先读然后条件写与写
平均而言,哪个更快 - 检查该值,然后根据需要进行分配,还是简单地分配?或者,用 C++ 术语来说:
bool b;
if(b)
b = false;
或
b = false;
假设 if() 条件为真,概率为 50%。答案很可能是高度依赖于架构的 - 请说出您的低级考虑因素。写入总是会弄脏缓存线 - 对吗?因此,通过避免写入,我们可以在 0.5 种情况下避免缓存刷新。但是足够智能的缓存可能会检测到微不足道的写入,并且本身不会变脏。但无条件写入始终是 1 次内存操作,而读写平均是 1.5 次操作。
免责声明:这是一个好奇问题,而不是我实际面临的问题。
Which is, on average, faster - check the value then, if needed, assign, or simply assign? Or, in C++ terms:
bool b;
if(b)
b = false;
or
b = false;
Assume that the if() condition is true with 50% probability. The answer will be, most likely, highly architecture dependent - please voice your low-level considerations. Writing always dirties the cache line - right? So by avoiding a write we avoid a cache flush in 0.5 cases. But a smart enough cache might detect a trivial write and not dirty itself. But the unconditional write is always exactly one memory operation, and read-write is, on average, 1.5 operations.
Disclaimer: this is a curiosity question, not a problem I actually face.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
分支在现代 CPU 上的成本很高,而内存访问在嵌入式/较旧的 CPU 上的成本也很高。因此,平面分配总是会更快,除非您有某种奇怪的内存,写入时间比读取时间长(提示:您不需要),
特别是由于以下原因,情况更糟:
if
语句。因此意味着额外的内存读取和缓存中不必要的更多空间消耗。b
被放入寄存器中。寄存器读/写非常便宜,但它们不是免费的。Branches are expensive on modern CPUs and memory access is expensive on embedded/older CPUs. So the flat just-assign will always be faster unless you have some kinda weird memory that takes longer to write than read(hint: you don't)
It is worse for these reasons specifically:
if
statement. So means an extra couple memory reads and more space unnecessarily consumed in the cache.b
is put into a register. Register reads/writes are very cheap, but they aren't free..为了获得实际结果,在不同的架构上进行分析绝对是值得的。
It would definitely be worth profiling this on different architectures to get actual results.
这取决于多种因素:
It depends on various things:
除了分析建议之外,它实际上还取决于备份该写入请求的内存 - 例如,如果它是内存映射闪存设备,则写入可能会非常昂贵。
In addition to suggestions to profile, it also really depends on what memory is backing up that write request - if it's a memory-mapped flash device, for example, the write might be extremely costly.
最近,我一直在阅读有关快速压缩技术的论文,其中有人强调需要避免
if
分支以实现最佳性能。其原因在于CPU 流水线。使用if
会破坏 CPU 并行执行部分代码时可以进行的许多优化。因此,如果您有很多此类操作,那么使用b = false
可能会更快。Recently I have been reading papers on very fast compression techniques and guys stressed there the need to avoid
if
branching to achieve the best performance. The reason for it is the CPU pipelining. Usingif
s breaks many of optimizations a CPU can make to execute parts of code in parallel. So, if you had a lot of this operations, it might be faster to useb = false
.在现代流水线处理器上,您需要考虑到这一点:
b 正在多个缓存中被修改,多次写入可能意味着多次缓存逐出,并且可能会抵消缓存的性能。
带有条件写入的读取至少有一次内存访问和一个可能会错误预测的分支。假设分支占 50% 的时间,则平均有 1.5 次内存访问,再加上错误预测的可能性。
无条件写入恰好有一次内存访问并且没有任何分支。
现在,您需要平衡错误预测的成本与存储的成本,该成本根据您拥有的缓存代理数量而变化。
On a modern pipelined processor you need to take this into account:
b
is being modified in more than one cache, multiple writes may mean multiple cache evictions and may offset the performance of the cache.Read with conditional write has at least one memory access and a branch that may mispredict. Assuming the branch is taken 50% of the time, you have 1.5 memory accesses on average, plus the chance of mispredicting.
Unconditional write has exactly one memory access and no branch whatsoever.
Now you need to balance the cost of mispredicting with the cost of a store, which changes depending on how many cache agents you have.
如果您正在进行指针、引用或基本值类型的分配,我个人认为直接分配会更快(热衷于在探查器上查看结果)。在 50% 概率的环境中,您可能会执行更多将值放入寄存器的指令。分配触发赋值运算符的结构或类对象将是最昂贵的。条件逻辑还引入了更多指令,并增加了代码复杂性指标
If you are doing assignment of pointer, reference or basic value type I personally think the direct assignment will be faster (keen to see the outcome on profiler). In 50% probability environment, you will potential execute a lot more instructions that putting value into register. Assigning struct or class object which trigger assignment operator will be the most expensive. Conditional logic also introduces more instructions and it add to the code complexity metrics