内存模型排序和可见性？

发布于 2024-12-05 07:47:09 字数 2205 浏览 1 评论 0原文

我尝试寻找有关此问题的详细信息，我什至阅读了有关互斥体和原子的标准...但我仍然无法理解 C++11 内存模型可见性保证。据我了解，互斥体除了互斥之外的一个非常重要的功能是确保可见性。也就是说，每次只有一个线程增加计数器是不够的，重要的是线程增加最后使用互斥锁的线程存储的计数器（我真的不知道为什么人们在讨论时不再提及这一点）互斥体，也许我的老师不好:)）。因此，据我所知，atomic 并没有强制立即可见性：（来自维护 boost::thread 并实现了 c++11 线程和互斥库的人）：

具有 memory_order_seq_cst 的栅栏不会强制立即执行对其他线程的可见性（MFENCE 指令也不可见）。 C++0x 内存排序约束就是这样 --- 排序限制。 memory_order_seq_cst 操作形成全序，但是该命令的内容没有任何限制，除了它必须得到所有线程的同意，并且不能违反其他顺序限制。特别是，线程可能会继续看到“陈旧”值在一段时间内，只要他们看到的值的顺序与的限制。

我对此表示同意。但问题是我很难理解 C++11 关于原子的构造是“全局的”，并且只能确保原子变量的一致性。特别是，我了解以下内存顺序中的哪些（如果有）可以保证在加载和存储之前和之后存在内存栅栏： http://www.stdthread.co.uk/doc/headers/atomic/memory_order.html< /a>

据我所知， std::memory_order_seq_cst 插入内存屏障，而其他仅强制对某些内存位置上的操作进行排序。

那么有人可以解决这个问题吗，我想很多人会使用 std::atomic 制造可怕的错误，特别是如果他们不使用默认值（std::memory_order_seq_cst 内存排序）
2.如果我是对的，这是否意味着此代码中的第二行是多余的：

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

3. std::atomic_thread_fences 是否与互斥体具有相同的要求，从某种意义上说，为了确保非原子变量的 seq 一致性，必须执行 std::atomic_thread_fence( std::memory_order_seq_cst); 加载前和 std::atomic_thread_fence(std::memory_order_seq_cst);
商店之后？
4.

  {
    regularSum+=atomicVar.load();
    regularVar1++;
    regularVar2++;
    }
    //...
    {
    regularVar1++;
    regularVar2++;
    atomicVar.store(74656);
  }

相当于

std::mutex mtx;
{
   std::unique_lock<std::mutex> ul(mtx);
   sum+=nowRegularVar;
   regularVar++;
   regularVar2++;
}
//..
{
   std::unique_lock<std::mutex> ul(mtx);
    regularVar1++;
    regularVar2++;
    nowRegularVar=(74656);
}

我认为不是，但我想确定。

编辑： 5. 可以断言火吗？
仅存在两个线程。

atomic<int*> p=nullptr;

第一个线程写入

{
    nonatomic_p=(int*) malloc(16*1024*sizeof(int));
    for(int i=0;i<16*1024;++i)
    nonatomic_p[i]=42;
    p=nonatomic;
}

第二个线程读取

{
    while (p==nullptr)
    {
    }
    assert(p[1234]==42);//1234-random idx in array
}

原文

I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees.
From what I understand the very important feature of mutex BESIDE mutual exclusion is ensuring visibility. Aka it is not enough that only one thread per time is increasing the counter, it is important that the thread increases the counter that was stored by the thread that was last using the mutex(I really dont know why people dont mention this more when discussing mutexes, maybe I had bad teachers :)).
So from what I can tell atomic doesnt enforce immediate visibility:
(from the person that maintains boost::thread and has implemented c++11 thread and mutex library):

A fence with memory_order_seq_cst does not enforce immediate
visibility to other threads (and neither does an MFENCE instruction).
The C++0x memory ordering constraints are just that --- ordering
constraints. memory_order_seq_cst operations form a total order, but
there are no restrictions on what that order is, except that it must
be agreed on by all threads, and it must not violate other ordering
constraints. In particular, threads may continue to see "stale" values
for some time, provided they see values in an order consistent with
the constraints.

And I'm OK with that. But the problem is that I have trouble understanding what C++11 constructs regarding atomic are "global" and which only ensure consistency on atomic variables.
In particular I have understanding which(if any) of the following memory orderings guarantee that there will be a memory fence before and after load and stores:
http://www.stdthread.co.uk/doc/headers/atomic/memory_order.html

From what I can tell std::memory_order_seq_cst inserts mem barrier while other only enforce ordering of the operations on certain memory location.

So can somebody clear this up, I presume a lot of people are gonna be making horrible bugs using std::atomic , esp if they dont use default (std::memory_order_seq_cst memory ordering)
2. if I'm right does that mean that second line is redundand in this code:

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

3. do std::atomic_thread_fences have same requirements as mutexes in a sense that to ensure seq consistency on nonatomic vars one must do std::atomic_thread_fence(std::memory_order_seq_cst);
before load and
std::atomic_thread_fence(std::memory_order_seq_cst);
after stores?
4. Is

  {
    regularSum+=atomicVar.load();
    regularVar1++;
    regularVar2++;
    }
    //...
    {
    regularVar1++;
    regularVar2++;
    atomicVar.store(74656);
  }

equivalent to

std::mutex mtx;
{
   std::unique_lock<std::mutex> ul(mtx);
   sum+=nowRegularVar;
   regularVar++;
   regularVar2++;
}
//..
{
   std::unique_lock<std::mutex> ul(mtx);
    regularVar1++;
    regularVar2++;
    nowRegularVar=(74656);
}

I think not, but I would like to be sure.

EDIT:
5.
Can assert fire?
Only two threads exist.

atomic<int*> p=nullptr;

first thread writes

{
    nonatomic_p=(int*) malloc(16*1024*sizeof(int));
    for(int i=0;i<16*1024;++i)
    nonatomic_p[i]=42;
    p=nonatomic;
}

second thread reads

{
    while (p==nullptr)
    {
    }
    assert(p[1234]==42);//1234-random idx in array
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生死何惧 2024-12-12 07:47:09

如果您喜欢处理栅栏，则 a.load(memory_order_acquire) 相当于 a.load(memory_order_relaxed) 后跟 atomic_thread_fence(memory_order_acquire)。同样，a.store(x,memory_order_release) 相当于在调用 a.store(x,memory_order_relaxed)< 之前调用 atomic_thread_fence(memory_order_release) /代码>。 memory_order_consume 是 memory_order_acquire 的特殊情况，仅用于相关数据。 memory_order_seq_cst 很特殊，它形成所有 memory_order_seq_cst 操作的总顺序。与其他混合，它与加载的获取和存储的释放相同。 memory_order_acq_rel用于读-修改-写操作，相当于RMW的读部分的获取和写部分的释放。

对原子操作使用排序约束可能会也可能不会产生实际的栅栏指令，具体取决于硬件架构。在某些情况下，如果您对原子操作设置排序约束而不是使用单独的栅栏，编译器将生成更好的代码。

在 x86 上，加载始终是获取，存储始终是释放。 memory_order_seq_cst 需要使用 MFENCE 指令或 LOCK 前缀指令进行更强的排序（这里有一个实现选择是否使存储具有更强的排序或负载）。因此，独立的获取和释放栅栏是无操作的，但atomic_thread_fence(memory_order_seq_cst)则不是（同样需要 MFENCE 或 LOCK 指令）。

排序约束的一个重要影响是它们对其他操作进行排序。

std::atomic<bool> ready(false);
int i=0;

void thread_1()
{
    i=42;
    ready.store(true,memory_order_release);
}

void thread_2()
{
    while(!ready.load(memory_order_acquire)) std::this_thread::yield();
    assert(i==42);
}

thread_2 旋转，直到从 ready 读取 true。由于 thread_1 中的存储到 ready 是一个发布，并且加载是一个获取，因此存储与加载同步，并且存储到断言中 i 发生在之前来自 i 的加载，并且断言不会触发。

2) 中的第二行

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

确实可能是多余的，因为atomicVar的存储默认使用memory_order_seq_cst。但是，如果该线程上还有其他非 Memory_order_seq_cst 原子操作，则栅栏可能会产生后果。例如，它将充当后续 a.store(x,memory_order_relaxed) 的释放栅栏。

3) 栅栏和原子操作不像互斥体那样工作。您可以使用它们来构建互斥体，但它们的工作方式与它们不同。您不必使用atomic_thread_fence(memory_order_seq_cst)。不要求任何原子操作都是 memory_order_seq_cst，并且可以实现非原子变量的排序，如上面的示例所示。

4）不，这些并不等同。因此，没有互斥锁的代码片段是数据竞争和未定义的行为。

5）不，你的断言不能触发。使用 memory_order_seq_cst 的默认内存排序，从原子指针 p 进行的存储和加载就像上面示例中的存储和加载一样，并且保证在读取之前发生对数组元素的存储。

If you like to deal with fences, then a.load(memory_order_acquire) is equivalent to a.load(memory_order_relaxed) followed by atomic_thread_fence(memory_order_acquire). Similarly, a.store(x,memory_order_release) is equivalent to a call to atomic_thread_fence(memory_order_release) before a call to a.store(x,memory_order_relaxed). memory_order_consume is a special case of memory_order_acquire, for dependent data only. memory_order_seq_cst is special, and forms a total order across all memory_order_seq_cst operations. Mixed with the others it is the same as an acquire for a load, and a release for a store. memory_order_acq_rel is for read-modify-write operations, and is equivalent to an acquire on the read part and a release on the write part of the RMW.

The use of ordering constraints on atomic operations may or may not result in actual fence instructions, depending on the hardware architecture. In some cases the compiler will generate better code if you put the ordering constraint on the atomic operation rather than using a separate fence.

On x86, loads are always acquire, and stores are always release. memory_order_seq_cst requires stronger ordering with either an MFENCE instruction or a LOCK prefixed instruction (there is an implementation choice here as to whether to make the store have the stronger ordering or the load). Consequently, standalone acquire and release fences are no-ops, but atomic_thread_fence(memory_order_seq_cst) is not (again requiring an MFENCE or LOCKed instruction).

An important effect of the ordering constraints is that they order other operations.

std::atomic<bool> ready(false);
int i=0;

void thread_1()
{
    i=42;
    ready.store(true,memory_order_release);
}

void thread_2()
{
    while(!ready.load(memory_order_acquire)) std::this_thread::yield();
    assert(i==42);
}

thread_2 spins until it reads true from ready. Since the store to ready in thread_1 is a release, and the load is an acquire then the store synchronizes-with the load, and the store to i happens-before the load from i in the assert, and the assert will not fire.

2) The second line in

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

is indeed potentially redundant, because the store to atomicVar uses memory_order_seq_cst by default. However, if there are other non-memory_order_seq_cst atomic operations on this thread then the fence may have consequences. For example, it would act as a release fence for a subsequent a.store(x,memory_order_relaxed).

3) Fences and atomic operations do not work like mutexes. You can use them to build mutexes, but they do not work like them. You do not have to ever use atomic_thread_fence(memory_order_seq_cst). There is no requirement that any atomic operations are memory_order_seq_cst, and ordering on non-atomic variables can be achieved without, as in the example above.

4) No these are not equivalent. Your snippet without the mutex lock is thus a data race and undefined behaviour.

5) No your assert cannot fire. With the default memory ordering of memory_order_seq_cst, the store and load from the atomic pointer p work like the store and load in my example above, and the stores to the array elements are guaranteed to happen-before the reads.

回复收藏 0 原文

冷夜 2024-12-12 07:47:09

据我所知，std::memory_order_seq_cst 插入内存屏障，而其他仅强制对特定内存位置上的操作进行排序。

这实际上取决于您正在做什么以及您使用的平台。与 IA64、PowerPC、ARM 等平台上较弱的排序模型相比，x86 等平台上的强内存排序模型将为内存栅栏操作的存在创建一组不同的要求。std 的默认参数是什么::memory_order_seq_cst 确保根据平台，将使用正确的内存栅栏指令。在像 x86 这样的平台上，除非您正在执行读取-修改-写入操作，否则不需要完整的内存屏障。根据 x86 内存模型，所有加载都具有加载-获取语义，所有存储都具有存储-释放语义。因此，在这些情况下，std::memory_order_seq_cst 枚举基本上会创建一个无操作，因为 x86 的内存模型已经确保这些类型的操作在线程之间保持一致，因此没有汇编指令实现这些类型的部分内存屏障。因此，如果您在 x86 上显式设置 std::memory_order_release 或 std::memory_order_acquire 设置，则同样的无操作条件为 true。此外，在这些情况下需要完整的内存屏障将是不必要的性能障碍。如前所述，仅读取-修改-存储操作需要它。

但在内存一致性模型较弱的其他平台上，情况并非如此，因此使用 std::memory_order_seq_cst 将采用正确的内存栅栏操作，而无需用户显式指定他们是否想要加载-获取、存储-释放或完整内存栅栏操作。这些平台具有用于执行此类内存一致性契约的特定机器指令，并且 std::memory_order_seq_cst 设置将解决正确的情况。如果用户想专门调用这些操作之一，他们可以通过显式的 std::memory_order 枚举类型，但这不是必需的......编译器会计算出正确的设置。

我认为很多人会使用 std::atomic 制造可怕的错误，特别是如果他们不使用默认值（std::memory_order_seq_cst 内存排序）

是的，如果他们不知道自己在做什么，并且不使用了解在某些操作中需要哪些类型的内存屏障语义，那么如果他们试图显式地声明内存屏障的类型并且这是不正确的，就会犯很多错误，特别是在将无助于他们对记忆顺序的误解，因为他们本质上较弱。

最后，请记住关于互斥体的情况 #4，这里需要发生两件不同的事情：

不允许编译器跨互斥体和临界区重新排序操作（特别是在优化编译器的情况下）
必须创建必要的内存栅栏（取决于平台），以维持一种状态，即在临界区和互斥变量读取之前完成所有存储，并且在退出临界区之前完成所有存储。

由于默认情况下，原子存储和加载是使用 std::memory_order_seq_cst 实现的，因此使用原子也可以实现适当的机制来满足条件 #1 和 #2。话虽这么说，在第一个原子示例中，加载将强制执行块的获取语义，而存储将强制执行释放语义。但它不会在这两个操作之间的“关键部分”内强制执行任何特定的顺序。在第二个示例中，您有两个带锁的不同部分，每个锁都具有获取语义。由于在某些时候您必须释放锁，这将具有释放语义，那么不，这两个代码块将不相同。在第一个示例中，您在加载和存储之间创建了一个大的“关键部分”（假设这一切都发生在同一线程上）。在第二个示例中，您有两个不同的关键部分。

PS 我发现以下 PDF 特别具有指导意义，您也可能会发现它：
http://www.nwcpp.org/Downloads/2008/Memory_Fences.pdf

From what I can tell std::memory_order_seq_cst inserts mem barrier while other only enforce ordering of the operations on certain memory location.

It really depends on what you're doing, and on what platform you're working with. The strong memory ordering model on a platform like x86 will create a different set of requirements for the existence of memory fence operations compared to a weaker ordering model on platforms like IA64, PowerPC, ARM, etc. What the default parameter of std::memory_order_seq_cst is ensuring is that depending on the platform, the proper memory fence instructions will be used. On a platform like x86, there is no need for a full memory barrier unless you are doing a read-modify-write operation. Per the x86 memory model, all loads have load-acquire semantics, and all stores have store-release semantics. Thus, in these cases the std::memory_order_seq_cst enum basically creates a no-op since the memory model for x86 already ensures that those types of operations are consistent across threads, and therefore there are no assembly instructions that implement these types of partial memory barriers. Thus the same no-op condition would be true if you explicitly set a std::memory_order_release or std::memory_order_acquire setting on x86. Furthermore, requiring a full memory-barrier in these situations would be an unnecessary performance impediment. As noted, it would only be required for read-modify-store operations.

On other platforms with weaker memory consistency models though, that would not be the case, and therefore using std::memory_order_seq_cst would employ the proper memory fence operations without the user having to explicitly specify whether they would like a load-acquire, store-release, or full memory fence operation. These platforms have specific machine instructions for enforcing such memory consistency contracts, and the std::memory_order_seq_cst setting would work out the proper case. If the user would like to specifically call for one of these operations they can through the explicit std::memory_order enum types, but it would not be necessary ... the compiler would work out the correct settings.

I presume a lot of people are gonna be making horrible bugs using std::atomic , esp if they dont use default (std::memory_order_seq_cst memory ordering)

Yes, if they don't know what they're doing, and don't understand which types of memory barrier semantics that are called for in certain operations, then there will be a lot of mistakes made if they attempt to explicitly state the type of memory barrier and it's the incorrect one, especially on platforms that will not help their mis-understanding of memory ordering because they are weaker in nature.

Finally, keep in mind with your situation #4 concerning a mutex that there are two different things that need to happen here:

The compiler must not be allowed to reorder operations across the mutex and critical section (especially in the case of an optimizing compiler)
There must be the requisite memory fences created (depending on the platform) that maintain a state where all stores are completed before the critical section and reading of the mutex variable, and all stores are completed before exiting the critical section.

Since by default, atomic stores and loads are implemented with std::memory_order_seq_cst, then using atomics would also implement the proper mechanisms to satisfy conditions #1 and #2. That being said, in your first example with atomics, the load would enforce acquire-semantics for the block, while the store would enforce release semantics. It would not though enforce any particular ordering inside the "critical section" between these two operations though. In your second example, you have two different sections with locks, each lock having acquire semantics. Since at some point you would have to release the locks, which would have release semantics, then no, the two code blocks would not be equivalent. In the first example, you've created a big "critical section" between the load and store (assuming this is all happening on the same thread). In the second example you have two different critical sections.

P.S. I've found the following PDF particularly instructive, and you may find it too:
http://www.nwcpp.org/Downloads/2008/Memory_Fences.pdf

回复收藏 0 原文

~没有更多了~