内存屏障与互锁操作

发布于 2024-09-11 01:30:37 字数 558 浏览 11 评论 0 原文

我正在努力提高对记忆障碍的理解。假设我们的内存模型较弱,并且我们采用 Dekker 算法。是否可以通过添加内存屏障使其在弱内存模型下正常工作?

我认为答案是令人惊讶的“不”。原因(如果我是正确的)是,虽然可以使用内存屏障来确保读取不会移过另一个读取,但它不能确保读取不会看到陈旧数据(例如缓存中的数据)。因此,它可以看到过去某个时间关键部分被解锁(根据 CPU 的缓存),但当前其他处理器可能会认为它已锁定。如果我的理解是正确的,则必须使用互锁操作(例如通常称为测试和设置或比较和交换的操作)来确保多个处理器之间内存位置处的值同步一致。

因此,我们是否可以正确地期望弱内存模型系统不会只提供内存屏障?系统必须提供诸如测试和设置或比较和交换之类的操作才能发挥作用。

我意识到流行的处理器(包括 x86)提供的内存模型比弱内存模型强得多。请重点讨论弱内存模型。

(如果 Dekker 的算法是一个糟糕的选择,请选择另一种互斥算法,其中内存屏障可以成功实现正确的同步(如果可能)。

I am trying to improve my understanding of memory barriers. Suppose we have a weak memory model and we adapt Dekker's algorithm. Is it possible to make it work correctly under the weak memory model by adding memory barriers?

I think the answer is a surprising no. The reason (if I am correct) is that although a memory barrier can be used to ensure that a read is not moved past another, it cannot ensure that a read does not see stale data (such as that in a cache). Thus it could see some time in the past when the critical section was unlocked (per the CPU's cache) but at the current time other processors might see it as locked. If my understanding is correct, one must use interlocked operations such as those commonly called test-and-set or compare-and-swap to ensure synchronized agreement of a value at a memory location among multiple processors.

Thus, can we rightly expect that no weak memory model system would provide only memory barriers? The system must supply operations like test-and-set or compare-and-swap to be useful.

I realize that popular processors, including x86, provide memory models much stronger than a weak memory model. Please focus the discussion on weak memory models.

(If Dekker's algorithm is a poor choice, choose another mutual exclusion algorithm where memory barriers can successfully achieve correct synchronization, if possible.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深海不蓝 2024-09-18 01:31:18

一些障碍(例如 powerpc isync 和 ia64 上的 .acq 加载)也会对后续加载产生影响。即:如果由于预取而在 isync 之前加载可用,则必须将其丢弃。如果使用得当,也许这足以使 Dekker 的算法在弱内存模型上工作。

您还可以使用缓存失效逻辑。如果您知道您的负载由于 isync 之类的原因而处于当前状态,并且如果另一个 cpu 接触到数据的缓存版本,则数据的缓存版本将失效,这是否足够?

抛开有趣的问题不谈,德克尔的算法对于所有实际目的来说都是愚蠢的。您将希望在任何实际应用程序中使用原子硬件接口和内存屏障,因此对我来说,专注于如何使用原子修复 Dekker 似乎并不值得;)

Some barriers (such as the powerpc isync, and a .acq load on ia64) also have an effect on ubsequent loads. ie: if a load was available before the isync due to prefetching it must be discarded. When used appropriately perhaps that's enough to make Dekker's algorithm work on a weak memory model.

You've also got cache invalidation logic working for you. If you know that your load is current due to something like an isync and that the cached version of the data is invalidated if another cpu touches it, is that enough?

Interesting questions aside, Dekker's algorithm is for all practical purposes dumb. You are going to want to use atomic hardware interfaces and memory barriers for any real application, so focusing on how to fix up Dekker's with atomics just doesn't seem worthwhile to me;)

金橙橙 2024-09-18 01:31:05

假设您在每个语句后都添加了加载和存储屏障,此外您还确保编译器不会对存储进行重新排序。在任何合理的架构上,这不会提供严格的一致性吗? Dekker 的研究工作是顺序一致的架构。顺序一致性是比严格一致性更弱的条件。

http://www.cs.nmsu.edu/~ pfeiffer/classes/573/notes/consistency.html

即使在具有弱一致性模型的 CPU 上,您仍然期望缓存一致性。我认为事情脱轨的地方是存储缓冲区和推测读取的行为,以及哪些操作可用刷新存储写入并使推测读取无效。如果没有可以使推测的读取无效的负载栅栏,或者没有刷新存储缓冲区的写入栅栏,那么除了无法实现 Dekker 之外,您将无法实现互斥体!

这是我的主张。如果您有一个可用的写屏障和一个读屏障,并且缓存在 CPU 之间是一致的,那么您可以通过在每条指令之后刷新写入(存储栅栏)并在每条指令之前刷新推测(读栅栏)来轻松地使所有代码顺序一致。操作说明。所以我声称你不需要原子来完成你所谈论的事情,并且你可以只用 Dekker 来做你需要的事情。当然你不会愿意。

顺便说一句,我工作的公司 Corensic 编写了很酷的工具来调试并发问题。请查看 http://www.corensic.com

Say you put in a load and store barrier after every statement, and in addition you ensured that the compiler didn't reorder your stores. Wouldn't this, on any reasonable architecture, provide strict consistency? Dekker's works on sequentially consistent architectures. Sequential consistency is a weaker condition than strict consistency.

http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html

Even on a CPU that has a weak consistency model, you'd still expect cache coherence. I think that where things get derailed is the behavior of store buffers and speculated reads, and what operations are available flush stored writes and invalidate speculated reads. If there isn't a load fence that can invalidate speculated reads, or there isn't a write fence that flushes a store buffer, in addition to not being able to implement Dekker's, you won't be able to implement a mutex!

So here's my claim. If you have a write barrier available, and a read barrier, and the cache is coherent between CPUs, then you can trivially make all code sequentially consistent by flushing writes (store fence) after every instruction, and flushing speculations (read fence) before every instruction. So I claim that you don't need atomics for what you're talking about, and that you can do what you need with Dekker's only. Sure you wouldn't want to.

BTW, Corensic, the company I work for, writes cool tools for debugging concurrency issues. Check out http://www.corensic.com.

这个俗人 2024-09-18 01:30:59

您是对的,内存屏障无法确保读取看到最新值。它所做的就是在单个线程上以及线程之间强制执行操作之间的顺序。

例如,如果线程 A 执行一系列存储,然后在最终存储到标志位置之前执行释放屏障,并且线程 B 从标志位置读取,然后在读取其他值之前执行获取屏障,则其他变量将拥有线程 A 存储的值:

// initially x=y=z=flag=0

// thread A
x=1;
y=2;
z=3;
release_barrier();
flag=1;

// thread B
while(flag==0) ; // loop until flag is 1
acquire_barrier();
assert(x==1);  // asserts will not fire
assert(y==2);
assert(z==3);

当然,您需要确保对 flag 的加载和存储是原子的(大多数常见的 CPU 上都有简单的加载和存储指令,前提是变量适当对齐) )。如果线程 B 上没有 while 循环,线程 B 很可能会读取 flag 的陈旧值 (0),因此您无法保证为其他变量读取任何值。

因此,栅栏可用于强制 Dekker 算法中的同步。

以下是 C++ 中的示例实现(使用 C++0x 原子变量):

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

有关完整分析,请参阅我的博客文章 http://www.justsoftwaresolutions.co.uk/threading/implementing_dekkers_algorithm_with_fences.html

You are right that a memory barrier cannot ensure that a read sees up-to-date values. What it does do is enforce an ordering between operations, both on a single thread, and between threads.

For example, if thread A does a series of stores and then executes a release barrier before a final store to a flag location, and thread B reads from the flag location and then executes an acquire barrier before reading the other values then the other variables will have the values stored by thread A:

// initially x=y=z=flag=0

// thread A
x=1;
y=2;
z=3;
release_barrier();
flag=1;

// thread B
while(flag==0) ; // loop until flag is 1
acquire_barrier();
assert(x==1);  // asserts will not fire
assert(y==2);
assert(z==3);

Of course, you need to ensure that the load and store to flag is atomic (which simple load and store instructions are on most common CPUs, provided the variables are suitably aligned). Without the while loop on thread B, thread B may well read a stale value (0) for flag, and thus you cannot guarantee any of the values read for the other variables.

Fences can thus be used to enforce synchronization in Dekker's algorithm.

Here's an example implementation in C++ (using C++0x atomic variables):

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

For a full analysis see my blog entry at http://www.justsoftwaresolutions.co.uk/threading/implementing_dekkers_algorithm_with_fences.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文