在 C++ 中访问原子变量有多快?
我的问题是使用 C++0x actomic<> 访问 C++ 中的原子变量有多快。班级?缓存级别发生了什么故障。假设一个线程只是读取它,它是否需要进入 RAM,或者它可以只从正在执行的核心的缓存中读取?假设架构是x86。
我特别想知道一个线程是否只是从中读取数据,而当时没有其他线程正在写入,那么惩罚是否与读取普通变量相同。如何访问原子变量。每次读隐式是否也涉及一次写操作,如比较和交换中那样?原子变量是通过使用比较和交换实现的吗?
My question is how fast is access to atomic variables in C++ by using the C++0x actomic<> class? What goes down at the cache level. Say if one thread is just reading it, would it need to go down to the RAM or it can just read from the cache of the core in which it is executing? Assume the architecture is x86.
I am especially interested in knowing if a thread is just reading from it, while no other thread is writing at that time, would the penalty would be the same as for reading a normal variable. How atomic variables are accessed. Does each read implicity involves a write as well, as in compare-and-swap? Are atomic variables implemented by using compare-and-swap?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您想要原始数据,Anger Fog 优化手册中的数据列表也应该有用,< a href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html/" rel="nofollow">intel 手册 有一些部分详细说明多核系统上内存读/写的延迟,其中应包括原子写入所需的总线锁定导致的速度减慢的详细信息。
If you want raw numbers, Anger Fog's data listings from his optimization manuals should be of use, also, intels manuals have a few section detailing the latencies for memory read/writes on multicore systems, which should include details on the slow-downs caused by bus locking needed for atomic writes.
答案并不像您想象的那么简单。这取决于确切的 CPU 型号,也取决于具体情况。最坏的情况是当您需要对变量执行读取-修改-写入操作并且存在冲突时(冲突的具体内容又取决于 CPU 型号,但最常见的是当另一个 CPU 访问同一缓存行时) 。
另请参阅.NET 或 Windows 同步基元性能规范
The answer is not as simple as you perhaps expect. It depends on exact CPU model, and it depends on circumstances as well. The worst case is when you need to perform read-modify-write operation on a variable and there is a conflict (what exactly is a conflict is again CPU model dependent, but most often it is when another CPU is accessing the same cache line).
See also .NET or Windows Synchronization Primitives Performance Specifications
原子使用特殊的体系结构支持来获得原子性,而无需强制所有读/写一直到主内存。基本上,每个核心都可以探测其他核心的缓存,因此它们可以通过这种方式找到其他线程操作的结果。
确切的性能取决于架构。在 x86 上,许多操作一开始就已经是原子的,因此它们是免费的。我见过从任何地方到 10 到 100 个周期的数字,具体取决于架构和操作。从长远来看,从主内存进行的任何读取都需要 3000-4000 个周期,因此在几乎所有平台上,原子操作都比直接读取内存快得多。
Atomics use special architecture support to get atomicity without forcing all reads/writes to go all the way to main memory. Basically, each core is allowed to probe the caches of other cores, so they find out about the result of other thread's operations that way.
The exact performance depends on the architecture. On x86, MANY operations were already atomic to start with, so they are free. I've seen numbers from anywhere to 10 to 100 cycles, depending on the architecture and operation. For perspective, any read from main memory is 3000-4000 cycles, so the atomics are all MUCH faster than going straight to memory on nearly all platforms.