我有一个非原子变量 my_var
和 std :: mutex my_mut
。我认为到达代码的这一点,程序员遵循了此规则:
每次程序员修改或写入 my_var
时,他锁定了
并解锁 my_mut
。
假设这是 thread1
执行以下操作:
my_mut.lock();
my_var.modify();
my_mut.unlock();
这是我想象的事件的顺序:
- prior
my_mut.lock();
,主存储器和一些本地缓存中可能有多个 my_var
的副本。即使程序员遵守规则,这些价值也不一定同意。
- 通过指令
my_mut.lock();
,所有先前执行的 my_mut
关键部分的写入在此线程中可见。
-
my_var.modify();
执行。
-
my_mut.unlock();
,在主存储器和某些本地caches中可能有多个副本的 my_var
。即使程序员遵守规则,这些价值也不一定同意。 my_var
在此线程结束时的值将可见到下一个线程,该线程锁定 my_mut
,到它锁定 my_mut
时。
我一直很难找到一个来源来验证这正是 std :: mutex
应该工作的方式。我咨询了C ++标准。来自 iso 2013 ,我找到了这一部分:
[注意:例如,获取sutex的呼叫将执行
在包含静音的位置获取操作。
相应地,发布相同静音的呼叫将执行
在相同位置释放操作。非正式地执行
对其他内存的副作用释放操作
以后执行的其他线程可见的位置
在A。
上消费或获取操作
是我对 std :: Mutex
的理解吗?
I have a non-atomic variable my_var
and an std::mutex my_mut
. I assume up to this point in the code, the programmer has followed this rule:
Each time the programmer modifies or writes to my_var
, he locks
and unlocks my_mut
.
Assuming this, Thread1
performs the following:
my_mut.lock();
my_var.modify();
my_mut.unlock();
Here is the sequence of events I imagine in my mind:
- Prior to
my_mut.lock();
, there were possibly multiple copies of my_var
in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule.
- By the instruction
my_mut.lock();
, all writes from the previously executed my_mut
critical section are visible in memory to this thread.
my_var.modify();
executes.
- After
my_mut.unlock();
, there are possibly multiple copies of my_var
in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule. The value of my_var
at the end of this thread will be visible to the next thread that locks my_mut
, by the time it locks my_mut
.
I have been having trouble finding a source that verifies that this is exactly how std::mutex
should work. I consulted the C++ standard. From ISO 2013, I found this section:
[ Note: For example, a call that acquires a mutex will perform an
acquire operation on the locations comprising the mutex.
Correspondingly, a call that releases the same mutex will perform a
release operation on those same locations. Informally, performing a
release operation on A forces prior side effects on other memory
locations to become visible to other threads that later perform a
consume or an acquire operation on A.
Is my understanding of std::mutex
correct?
发布评论
评论(2)
C ++在操作之间的关系而不是某些特定的硬件项(例如缓存凝聚力)上运行。因此,C ++标准具有发生在之间的关系,这大致意味着在完成所有副作用之前发生的任何,因此在此之后发生的那一刻可见。
鉴于您有一个独家的关键会话,您输入的意味着,无论发生在此关键部分时,发生在中。因此,进入它的任何结果都将看到之前发生的一切。这就是标准任务。其他所有内容(包括缓存凝聚力)都是实现的职责:它必须确保所描述的行为与实际发生的事情相一致。
C++ operates on the relations between operations not some particular hardware terms (like cache cohesion). So C++ Standard has a happens-before relationship which roughly means that whatever happened before completed all its side-effects and therefore is visible at the moment that happened after.
And given you have an exclusive critical session to which you have entered means that whatever happens within it, happens before the next time this critical section is entered. So any consequential entering to it will see everything happened before. That's what the Standard mandates. Everything else (including the cache cohesion) is the implementation's duty: it has to make sure that the described behavior is coherent with what actually happens.
硬件已经保持缓存连贯性,因此在现实世界中,不同缓存中的相互矛盾的副本是不可能的。 afaik,没有C ++实现可以运行
std :: thread
跨核,而无需连贯的缓存,并且将来不太可能成为一件事情。有异质系统,例如ARM DSP + MCU,但是您不会在此类内核之间运行一个程序的线程。 (而且您不会在此类内核上引导单个操作系统。)地址将有一个DRAM的值,但是所有CPU内核通过缓存都可以使用,因此值无关紧要:另一个核心的Cache中的修改副本将优先考虑硬件缓存连贯性。
另请参见
memory_order_raxed
用volaTile
在GCC和Clang上使用,并在需要时使用inline asm inline asm。volatile
确实有很大的作用,例如atomic
sloadeed
。)版本
存储都必须知道要刷新的缓存部分,但是编译器通常不知道哪些变量是共享的。更糟糕的是,在从其他内核中写入之前,肮脏的写下卡切斯需要将其写回,以便我们以后的负载实际上可以看到它们。grograme在具有非固定共享内存的内核上运行的程序可以将其用于消息通话,消息传播,例如,通过MPI,该程序明确说明了哪些内存区域在何时冲洗。 C ++的多线程内存模型不适合此类系统。这就是为什么主流多CPU系统为 ccnuma> ccnuma ;可以在群集的节点之间找到非连接的共享内存,但这同样是您使用MPI之类的地方,而不是在单独的节点上运行的OS的单独实例上的C ++线程。
Hardware already maintains cache coherence so conflicting copies in different caches are impossible on real-world systems. AFAIK, there are no C++ implementations that run
std::thread
across cores without coherent caches, and it's unlikely to be a thing in the future. There are heterogenous systems like ARM DSP + MCU, but you don't run threads of one program between such cores. (And you don't boot a single OS across such cores.)There will be a value in DRAM for the address, but all CPU cores access memory through cache so that value doesn't matter: a Modified copy in another core's cache will take priority, thanks to hardware cache coherence.
See also
memory_order_relaxed
ops withvolatile
on GCC and Clang, with inline asm for more ordering when needed. But cache-coherent hardware is why justvolatile
does work a lot likeatomic
withrelaxed
.)release
store would have to know what parts of cache to flush, but the compiler normally doesn't know which variables are shared or not. And worse, dirty write-back caches would need to get written back before writes from other cores so our later loads can actually see them.Programs running on cores with non-coherent shared memory can use it for message-passing, e.g. via MPI, where the program is explicit about which memory regions are flushed when. C++'s multithreaded memory model is not suitable for such systems. That's why mainstream multi-CPU systems are ccNUMA; non-coherent shared memory can be found between nodes of a cluster, but again that's where you'd use MPI or something, not C++ threads across separate instances of an OS running on separate nodes.