原子引用计数共享不可变数据是否需要内存屏障?

发布于 2024-08-27 18:19:47 字数 439 浏览 6 评论 0原文

我有一些不可变的数据结构,我想使用引用计数来管理它们,并在 SMP 系统上的线程之间共享它们。

发布代码如下所示:

void avocado_release(struct avocado *p)
{
    if (atomic_dec(p->refcount) == 0) {
        free(p->pit);
        free(p->juicy_innards);
        free(p);
    }
}

atomic_dec 中需要内存屏障吗?如果是这样,什么样的内存屏障?

附加说明:该应用程序必须在 PowerPC 和 x86 上运行,因此欢迎任何特定于处理器的信息。我已经了解 GCC 原子内置函数。至于不变性,引用计数是唯一在对象的持续时间内发生变化的字段。

I have some immutable data structures that I would like to manage using reference counts, sharing them across threads on an SMP system.

Here's what the release code looks like:

void avocado_release(struct avocado *p)
{
    if (atomic_dec(p->refcount) == 0) {
        free(p->pit);
        free(p->juicy_innards);
        free(p);
    }
}

Does atomic_dec need a memory barrier in it? If so, what kind of memory barrier?

Additional notes: The application must run on PowerPC and x86, so any processor-specific information is welcomed. I already know about the GCC atomic builtins. As for immutability, the refcount is the only field that changes over the duration of the object.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

提赋 2024-09-03 18:19:47

在 x86 上,它将变成 lock 前缀汇编指令,如 LOCK XADD
作为一条指令,它是不可中断的。作为附加的“功能”,lock 前缀会导致完整的内存屏障:

“...锁定操作序列化所有未完成的加载和存储操作(即等待它们完成)。” ...“锁定操作相对于所有其他内存操作和所有外部可见事件而言是原子的。只有取指和页表访问才能传递锁定指令。锁定指令可用于同步一个处理器写入的数据和另一处理器读取的数据”。 - 英特尔® 64 和 IA-32 架构软件开发人员手册,第 8.1 章.2.

实际上,内存屏障在 .NETx86/x64 上的 JAVA JIT,因为 mfence 在许多 CPU 上速度较慢,即使它保证可用,就像在 64 位模式下一样。 (lock xchg 是否与 mfence 具有相同的行为?)
因此,无论您是否喜欢,您都可以在 x86 上拥有完整的围栏作为额外的好处。 :-)

在 PPC 上,情况有所不同。 LL/SC 对 - lwarx & stwcx - 内部有减法,可用于将内存操作数加载到寄存器中,减一,然后如果没有其他存储到目标位置,则将其写回,或者重试整个循环(如果有)。 LL/SC 可以被中断(意味着它将失败并重试)。
它也不意味着自动完整围栏。
然而,这不会以任何方式损害计数器的原子性。
这只是意味着在 x86 情况下,您碰巧也“免费”获得了栅栏。
在 PPC 上,可以通过发出 来插入(部分或)完整栅栏(lw)sync 指令

总而言之,显式内存屏障对于原子计数器的正常工作来说并不是必需的。

On x86, it will turn into a lock prefixed assembly instruction, like LOCK XADD.
Being a single instruction, it is non-interruptible. As an added "feature", the lock prefix results in a full memory barrier:

"...locked operations serialize all outstanding load and store operations (that is, wait for them to complete)." ..."Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor." - Intel® 64 and IA-32 Architectures Software Developer’s Manual, Chapter 8.1.2.

A memory barrier is in fact implemented as a dummy LOCK OR or LOCK AND in both the .NET and the JAVA JIT on x86/x64, because mfence is slower on many CPUs even when it's guaranteed to be available, like in 64-bit mode. (Does lock xchg have the same behavior as mfence?)
So you have a full fence on x86 as an added bonus, whether you like it or not. :-)

On PPC, it is different. An LL/SC pair - lwarx & stwcx - with a subtraction inside can be used to load the memory operand into a register, subtract one, then either write it back if there was no other store to the target location, or retry the whole loop if there was. An LL/SC can be interrupted (meaning it will fail and retry).
It also does not mean an automatic full fence.
This does not however compromise the atomicity of the counter in any way.
It just means that in the x86 case, you happen to get a fence as well, "for free".
On PPC, one can insert a (partial or) full fence by emitting a (lw)sync instruction.

All in all, explicit memory barriers are not necessary for the atomic counter to work properly.

紙鸢 2024-09-03 18:19:47

区分原子访问(保证值的读取/修改/写入作为一个原子单元执行)与内存重新排序非常重要。

内存屏障防止读取和写入重新排序。重新排序与原子性完全正交。例如,在 PowerPC 上,如果您实现尽可能最有效的原子增量,那么它不会阻止重新排序。如果您想防止重新排序,则需要 lwsyncsync 指令,或一些等效的高级(C++ 11?)内存屏障。

声称“编译器不可能以有问题的方式重新排序”作为一般说法似乎很天真,因为编译器优化可能非常令人惊讶,而且因为 CPU(特别是 PowerPC/ARM/Alpha/MIPS)会积极地重新排序内存操作。

连贯的缓存也救不了你。请参阅 https://preshing.com/archives/ 了解内存重新排序的实际工作原理。

然而,在这种情况下,我相信答案是不需要任何障碍。这是因为对于这种特定情况(引用计数),不需要引用计数与对象中的其他值之间的关系。唯一的例外是引用计数为零时。此时,确保来自其他线程的所有更新对于当前线程可见非常重要,因此可能需要读取获取屏障。

It is important to distinguish between atomic accesses (which guarantee that the read/modify/write of the value executes as one atomic unit) vs. memory reordering.

Memory barriers prevent reordering of reads and writes. Reordering is completely orthogonal to atomicity. For instance, on PowerPC if you implement the most efficient atomic increment possible then it will not prevent reordering. If you want to prevent reordering then you need an lwsync or sync instruction, or some equivalent high-level (C++ 11?) memory barrier.

Claims that there is "no possibility of the compiler reordering things in a problematic way" seem naive as general statements because compiler optimizations can be quite surprising and because CPUs (PowerPC/ARM/Alpha/MIPS in particular) aggressively reorder memory operations.

A coherent cache doesn't save you either. See https://preshing.com/archives/ to see how memory reordering really works.

In this case, however, I believe the answer is that no barriers are required. That is because for this specific case (reference counting) there is no need for a relationship between the reference count and the other values in the object. The one exception is when the reference count hits zero. At that point it is important to ensure that all updates from other threads are visible to the current thread so a read-acquire barrier may be necessary.

記憶穿過時間隧道 2024-09-03 18:19:47

您是否打算实现自己的atomic_dec,或者您只是想知道系统提供的函数是否会按照您想要的方式运行?

作为一般规则,系统提供的原子递增/递减设施将应用做正确的事情所需的任何内存屏障。通常,您不必担心内存障碍,除非您正在做一些古怪的事情,例如实现自己的无锁数据结构或 STM 库。

Are you intending to implement your own atomic_dec or are you just wondering whether a system-supplied function will behave as you want?

As a general rule, system-supplied atomic increment/decrement facilities will apply whatever memory barriers are required to just do the right thing. You generally don't have to worry about memory barriers unless you are doing something wacky like implementing your own lock-free data structures or an STM library.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文