自旋循环在缓存一致性方面的开销
假设一个核心中的线程正在对一个变量进行旋转,该变量将由另一个核心上运行的线程进行更新。我的问题是缓存级别的开销是多少。等待线程是否会缓存该变量,从而在写入线程写入该变量之前不会在总线上造成任何流量?
如何才能减少这个开销呢? x86 pause
指令有帮助吗?
Say a thread in one core is spinning on a variable which will be updated by a thread running on another core. My question is what is the overhead at cache level. Will the waiting thread cache the variable and therefore does not cause any traffic on the bus until the writing thread writes to that variable?
How can this overhead be reduced. Does x86 pause
instruction help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我相信所有现代 x86 CPU 都使用 MESI 协议。因此,旋转的“读取器”线程可能会以“独占”或“共享”模式缓存数据副本,在旋转时不会产生内存总线流量。
只有当另一个核心写入该位置时,它才必须执行跨核心通信。
[更新]
只有当您不会旋转很长时间时,这样的“自旋锁”才是一个好主意。如果变量可能需要一段时间才能更新,请改用互斥锁 + 条件变量,这将使线程进入睡眠状态,以便在等待时不会增加任何开销。
(顺便说一句,我怀疑很多人——包括我——都在想“你到底想做什么?”)
I believe all modern x86 CPUs use the MESI protocol. So the spinning "reader" thread will likely have a cached copy of the data in either "exclusive" or "shared" mode, generating no memory bus traffic while you spin.
It is only when the other core writes to the location that it will have to perform cross-core communication.
[update]
A "spinlock" like this is only a good idea if you will not be spinning for very long. If it may be a while before the variable gets updated, use a mutex + condition variable instead, which will put your thread to sleep so that it adds no overhead while it waits.
(Incidentally, I suspect a lot of people -- including me -- are wondering "what are you actually trying to do?")
如果你旋转锁的时间间隔很短,通常就没有问题。然而,Linux 上有一个计时器中断(我假设其他操作系统上也类似),因此如果您旋转锁 10 毫秒或接近它,您将看到缓存干扰。
我听说可以修改 Linux 内核以防止特定内核上的所有中断,并且这种干扰就会消失,但我不知道这样做涉及什么。
If you spin lock for short intervals you are usually fine. However there is a timer interrupt on Linux (and I assume similar on other OSes) so if you spin lock for 10 ms or close to it you will see a cache disturbance.
I have heard its possible to modify the Linux kernel to prevent all interrupts on specific cores and this disturbance goes away, but I don't know what is involved in doing this.
在两个线程的情况下,开销可以被忽略,无论如何,做一个简单的基准测试可能是个好主意。例如,如果您实现自旋锁,则线程花费多少时间进行自旋。
这种对缓存的影响称为缓存行弹跳。
In the case of two threads the overhead may be ignored, anyway it could be a good idea make a simple benchmark. For instance, if you implement spinlocks, how much time the thread spends into the spin.
This effect on the cache it's called cache line bouncing.
我在这篇文章中对此进行了广泛的测试。一般来说,开销是由自旋锁的总线锁定组件引起的,通常是指令“xchg reg,mem”或其某些变体。由于这种特定的开销无法避免,因此您可以选择节省调用自旋锁的频率,并在释放锁之前执行绝对最小量的必要工作(一旦锁就位)。
I tested this extensively in this post. The overhead in general is incurred by the bus-locking component of the spinlock, usually the instruction "xchg reg,mem" or some variant of it. Since that particular overhead cannot be avoided you have the options of economizing on the frequency with which you invoke the spinlock and performing the absolute minimum amount of work necessary - once the lock is in place - before releasing it.