为什么需要内存屏障?

发布于 2024-09-14 05:12:45 字数 1369 浏览 6 评论 0原文

C# 4 in a Nutshell(强烈推荐顺便说一句)使用以下代码来演示 MemoryBarrier 的概念(假设 A 和 B 在不同的线程上运行):

class Foo{
  int _answer;
  bool complete;
  void A(){
    _answer = 123;
    Thread.MemoryBarrier(); // Barrier 1
    _complete = true;
    Thread.MemoryBarrier(); // Barrier 2
  }
  void B(){
    Thread.MemoryBarrier(); // Barrier 3;
    if(_complete){
      Thread.MemoryBarrier(); // Barrier 4;
      Console.WriteLine(_answer);
    }
  }
}

他们提到了 Barriers 1 和 Barriers 2。 4 防止此示例写入 0 和 Barriers 2 & 3 提供新鲜度保证:它们确保如果 B 在 A 之后运行,则读取 _complete 将评估为 true

我不太明白。我想我明白为什么障碍 1 和障碍 2 4 是必要的:我们不希望对 _answer 的写入进行优化并将其放置在对 _complete 的写入之后(障碍 1),并且我们需要确保 >_answer 未缓存(障碍 4)。我还认为我理解为什么屏障 3 是必要的:如果 A 在写入 _complete = true 之后运行,B 仍然需要刷新 _complete 才能读取正确的值。

我不明白为什么我们需要屏障 2!我的部分想法是,这是因为线程 2(运行 B)可能已经运行到(但不包括)if(_complete),因此我们需要确保刷新 _complete

但是,我不明白这有什么帮助。是否仍然有可能在 A 中将 _complete 设置为 true,但 B 方法却会看到 _complete 的缓存(假)版本?即,如果线程 2 运行方法 B 直到第一个 MemoryBarrier 之后,然后线程 1 运行方法 A 直到 _complete = true 但不再继续,然后线程 1 恢复并测试 if(_complete)< /strong> -- if 不会导致 false 吗?

C# 4 in a Nutshell (highly recommended btw) uses the following code to demonstrate the concept of MemoryBarrier (assuming A and B were run on different threads):

class Foo{
  int _answer;
  bool complete;
  void A(){
    _answer = 123;
    Thread.MemoryBarrier(); // Barrier 1
    _complete = true;
    Thread.MemoryBarrier(); // Barrier 2
  }
  void B(){
    Thread.MemoryBarrier(); // Barrier 3;
    if(_complete){
      Thread.MemoryBarrier(); // Barrier 4;
      Console.WriteLine(_answer);
    }
  }
}

they mention that Barriers 1 & 4 prevent this example from writing 0 and Barriers 2 & 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true.

I'm not really getting it. I think I understand why Barriers 1 & 4 are necessary: we don't want the write to _answer to be optimized and placed after the write to _complete (Barrier 1) and we need to make sure that _answer is not cached (Barrier 4). I also think I understand why Barrier 3 is necessary: if A ran until just after writing _complete = true, B would still need to refresh _complete to read the right value.

I don't understand though why we need Barrier 2! Part of me says that it's because perhaps Thread 2 (running B) already ran until (but not including) if(_complete) and so we need to insure that _complete is refreshed.

However, I don't see how this helps. Isn't it still possible that _complete will be set to true in A but yet the B method will see a cached (false) version of _complete? Ie, if Thread 2 ran method B until after the first MemoryBarrier and then Thread 1 ran method A until _complete = true but no further, and then Thread 1 resumed and tested if(_complete) -- could that if not result in false?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你不是我要的菜∠ 2024-09-21 05:12:45

障碍 #2 确保对 _complete 的写入立即提交。否则,它可能会保持在排队状态,这意味着读取 B 中的 _complete 将看不到 A 引起的更改,即使 B有效地使用了易失性读取。

当然,这个例子并没有完全解决这个问题,因为 A 在写入 _complete 后什么也不做,这意味着由于线程终止,无论如何都会立即提交写入早期的。

对于 if 是否仍然可以计算为 false 的问题,答案是肯定的,原因正是您所说的。但是,请注意作者关于这一点的说法。

障碍 1 和 4 阻止了此示例
从写“0”开始。障碍 2 和 3
提供新鲜度保证:他们
确保如果 B 在 A 之后跑,读取
_complete 将评估为 true。

强调“如果B追赶A”是我的。这当然可能是两个线程交叉的情况。但是,作者忽略了这种情况,大概是为了让他关于 Thread.MemoryBarrier 如何工作的观点变得更简单。

顺便说一句,我很难在我的机器上设计一个示例,其中障碍#1和#2会改变程序的行为。这是因为在我的环境中,有关写入的内存模型很强。也许,如果我有一台多处理器机器,正在使用 Mono,或者有其他一些不同的设置,我就可以演示它。当然,很容易证明消除第 3 条和第 4 条障碍会产生影响。

Barrier #2 guarentees that the write to _complete gets committed immediately. Otherwise it could remain in a queued state meaning that the read of _complete in B would not see the change caused by A even though B effectively used a volatile read.

Of course, this example does not quite do justice to the problem because A does nothing more after writing to _complete which means that the write will be comitted immediately anyway since the thread terminates early.

The answer to your question of whether the if could still evaluate to false is yes for exactly the reasons you stated. But, notice what the author says regarding this point.

Barriers 1 and 4 prevent this example
from writing “0”. Barriers 2 and 3
provide a freshness guarantee: they
ensure that if B ran after A, reading
_complete would evaluate to true.

The emphasis on "if B ran after A" is mine. It certainly could be the case that the two threads interleave. But, the author was ignoring this scenario presumably to make his point regarding how Thread.MemoryBarrier works simpler.

By the way, I had a hard time contriving an example on my machine where barriers #1 and #2 would have altered the behavior of the program. This is because the memory model regarding writes was strong in my environment. Perhaps, if I had a multiprocessor machine, was using Mono, or had some other different setup I could have demonstrated it. Of course, it was easy to demonstrate that removing barriers #3 and #4 had an impact.

客…行舟 2024-09-21 05:12:45

该示例不清楚有两个原因:

  1. 它太简单,无法完全显示栅栏发生的情况。
  2. Albahari 包括对非 x86 架构的要求。请参阅 MSDN :“仅在内存排序较弱的多处理器系统上才需要 MemoryBarrier(例如,采用多个 Intel Itanium 处理器的系统 [Microsoft 不再支持])。”。

如果您考虑以下内容,就会变得更清楚:

  1. 内存屏障(此处为全屏障 - .Net 不提供半屏障)可防止读/写指令跳过栅栏(由于各种优化)。这保证了栅栏之后的代码将在栅栏之前的代码之后执行。
  2. “这种序列化操作保证在 MFENCE 指令之后的任何加载或存储指令全局可见之前,按程序顺序位于 MFENCE 指令之前的每个加载和存储指令都是全局可见的。”请参阅此处
  3. x86 CPU 具有强大的内存模型,并保证写入对所有线程/核心都一致(因此在 x86 上不需要屏障 #2 和 #3)。但是,我们不能保证读取和写入将保持编码顺序,因此需要屏障 #1 和 #4。
  4. 内存屏障效率低下,不需要使用(请参阅同一篇 MSDN 文章)。我个人使用Interlocked和易失性(确保你知道如何正确使用它!!),它们工作高效且易于理解。

诗。 本文解释了内部工作原理x86 的很好。

The example is unclear for two reasons:

  1. It is too simple to fully show what's happening with the fences.
  2. Albahari is including requirements for non-x86 architectures. See MSDN: "MemoryBarrier is required only on multiprocessor systems with weak memory ordering (for example, a system employing multiple Intel Itanium processors [which Microsoft no longer supports]).".

If you consider the following, it becomes clearer:

  1. A memory barrier (full barriers here - .Net doesn't provide a half barrier) prevents read / write instructions from jumping the fence (due to various optimisations). This guarantees us the code after the fence will execute after the code before the fence.
  2. "This serializing operation guarantees that every load and store instruction that precedes in program order the MFENCE instruction is globally visible before any load or store instruction that follows the MFENCE instruction is globally visible." See here.
  3. x86 CPUs have a strong memory model and guarantee writes appear consistent to all threads / cores (therefore barriers #2 & #3 are unneeded on x86). But, we are not guaranteed that reads and writes will remain in coded sequence, hence the need for barriers #1 and #4.
  4. Memory barriers are inefficient and needn't be used (see the same MSDN article). I personally use Interlocked and volatile (make sure you know how to use it correctly!!), which work efficiently and are easy to understand.

Ps. This article explains the inner workings of x86 nicely.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文