STD :: MUTEX是否强制执行缓存相干?

发布于 2025-01-29 04:17:50 字数 1214 浏览 1 评论 0 原文

我有一个非原子变量 my_var std :: mutex my_mut 。我认为到达代码的这一点,程序员遵循了此规则:

每次程序员修改或写入 my_var 时,他锁定了 并解锁 my_mut

假设这是 thread1 执行以下操作:

my_mut.lock();
my_var.modify();
my_mut.unlock();

这是我想象的事件的顺序:

  1. prior my_mut.lock(); ,主存储器和一些本地缓存中可能有多个 my_var 的副本。即使程序员遵守规则,这些价值也不一定同意。
  2. 通过指令 my_mut.lock(); ,所有先前执行的 my_mut 关键部分的写入在此线程中可见。
  3. my_var.modify(); 执行。
  4. my_mut.unlock(); ,在主存储器和某些本地caches中可能有多个副本的 my_var 。即使程序员遵守规则,这些价值也不一定同意。 my_var 在此线程结束时的值将可见到下一个线程,该线程锁定 my_mut ,到它锁定 my_mut 时。

我一直很难找到一个来源来验证这正是 std :: mutex 应该工作的方式。我咨询了C ++标准。来自 iso 2013 ,我找到了这一部分:

[注意:例如,获取sutex的呼叫将执行 在包含静音的位置获取操作。 相应地,发布相同静音的呼叫将执行 在相同位置释放操作。非正式地执行 对其他内存的副作用释放操作 以后执行的其他线程可见的位置 在A。

上消费或获取操作

是我对 std :: Mutex 的理解吗?

I have a non-atomic variable my_var and an std::mutex my_mut. I assume up to this point in the code, the programmer has followed this rule:

Each time the programmer modifies or writes to my_var, he locks
and unlocks my_mut.

Assuming this, Thread1 performs the following:

my_mut.lock();
my_var.modify();
my_mut.unlock();

Here is the sequence of events I imagine in my mind:

  1. Prior to my_mut.lock();, there were possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule.
  2. By the instruction my_mut.lock();, all writes from the previously executed my_mut critical section are visible in memory to this thread.
  3. my_var.modify(); executes.
  4. After my_mut.unlock();, there are possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule. The value of my_var at the end of this thread will be visible to the next thread that locks my_mut, by the time it locks my_mut.

I have been having trouble finding a source that verifies that this is exactly how std::mutex should work. I consulted the C++ standard. From ISO 2013, I found this section:

[ Note: For example, a call that acquires a mutex will perform an
acquire operation on the locations comprising the mutex.
Correspondingly, a call that releases the same mutex will perform a
release operation on those same locations. Informally, performing a
release operation on A forces prior side effects on other memory
locations to become visible to other threads that later perform a
consume or an acquire operation on A.

Is my understanding of std::mutex correct?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一曲爱恨情仇 2025-02-05 04:17:50

C ++在操作之间的关系而不是某些特定的硬件项(例如缓存凝聚力)上运行。因此,C ++标准具有发生在之间的关系,这大致意味着在完成所有副作用之前发生的任何,因此在此之后发生的那一刻可见。

鉴于您有一个独家的关键会话,您输入的意味着,无论发生在此关键部分时,发生在中。因此,进入它的任何结果都将看到之前发生的一切。这就是标准任务。其他所有内容(包括缓存凝聚力)都是实现的职责:它必须确保所描述的行为与实际发生的事情相一致。

C++ operates on the relations between operations not some particular hardware terms (like cache cohesion). So C++ Standard has a happens-before relationship which roughly means that whatever happened before completed all its side-effects and therefore is visible at the moment that happened after.

And given you have an exclusive critical session to which you have entered means that whatever happens within it, happens before the next time this critical section is entered. So any consequential entering to it will see everything happened before. That's what the Standard mandates. Everything else (including the cache cohesion) is the implementation's duty: it has to make sure that the described behavior is coherent with what actually happens.

戏蝶舞 2025-02-05 04:17:50

my_mut.unlock(); ,在主内存中可能有MY_VAR的多个副本和一些本地caches。这些价值不一定同意,...

硬件已经保持缓存连贯性,因此在现实世界中,不同缓存中的相互矛盾的副本是不可能的。 afaik,没有C ++实现可以运行 std :: thread 跨核,而无需连贯的缓存,并且将来不太可能成为一件事情。有异质系统,例如ARM DSP + MCU,但是您不会在此类内核之间运行一个程序的线程。 (而且您不会在此类内核上引导单个操作系统。)

地址将有一个DRAM的值,但是所有CPU内核通过缓存都可以使用,因此值无关紧要:另一个核心的Cache中的修改副本将优先考虑硬件缓存连贯性。

另请参见


grograme在具有非固定共享内存的内核上运行的程序可以将其用于消息通话,消息传播,例如,通过MPI,该程序明确说明了哪些内存区域在何时冲洗。 C ++的多线程内存模型不适合此类系统。这就是为什么主流多CPU系统为 ccnuma> ccnuma ;可以在群集的节点之间找到非连接的共享内存,但这同样是您使用MPI之类的地方,而不是在单独的节点上运行的OS的单独实例上的C ++线程。

After my_mut.unlock();, there are possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, ...

Hardware already maintains cache coherence so conflicting copies in different caches are impossible on real-world systems. AFAIK, there are no C++ implementations that run std::thread across cores without coherent caches, and it's unlikely to be a thing in the future. There are heterogenous systems like ARM DSP + MCU, but you don't run threads of one program between such cores. (And you don't boot a single OS across such cores.)

There will be a value in DRAM for the address, but all CPU cores access memory through cache so that value doesn't matter: a Modified copy in another core's cache will take priority, thanks to hardware cache coherence.

See also

  • https://en.wikipedia.org/wiki/MESI_protocol the standard cache-coherency protocol. Modern CPUs don't use a shared bus, though, they use a directory (e.g. L3 tags) to keep track of which core might have a modified copy of any given line, so they know which core to signal to write-back a line a Read For Ownership (write miss) or share-request (read miss) happens for a line.
  • When to use volatile with multi threading? (Never, except Linux kernel code which does roll its own memory_order_relaxed ops with volatile on GCC and Clang, with inline asm for more ordering when needed. But cache-coherent hardware is why just volatile does work a lot like atomic with relaxed.)
  • Is cache coherency required for memory consistency? including discussion in comments - implementing C++'s coherency requirements with manual flushing would be very onerous, e.g. every release store would have to know what parts of cache to flush, but the compiler normally doesn't know which variables are shared or not. And worse, dirty write-back caches would need to get written back before writes from other cores so our later loads can actually see them.
  • http://eel.is/c++draft/intro.races#19 - [Note 19: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. — end note]

Programs running on cores with non-coherent shared memory can use it for message-passing, e.g. via MPI, where the program is explicit about which memory regions are flushed when. C++'s multithreaded memory model is not suitable for such systems. That's why mainstream multi-CPU systems are ccNUMA; non-coherent shared memory can be found between nodes of a cluster, but again that's where you'd use MPI or something, not C++ threads across separate instances of an OS running on separate nodes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文