记忆栅栏如何影响“新鲜度”？数据？

发布于 2024-08-12 01:26:06 字数 1451 浏览 5 评论 0原文

我对以下代码示例有疑问（取自： http://www.albahari .com/threading/part4.aspx#_NonBlockingSynch）

class Foo
{
   int _answer;
   bool _complete;

   void A()
   {
       _answer = 123;
       Thread.MemoryBarrier();    // Barrier 1
       _complete = true;
       Thread.MemoryBarrier();    // Barrier 2
   }

    void B()
    {
       Thread.MemoryBarrier();    // Barrier 3
       if (_complete)
       {  
          Thread.MemoryBarrier(); // Barrier 4
          Console.WriteLine (_answer);
       }
    }
 }

接下来是以下解释：

“屏障 1 和 4 阻止此示例写入“0”。屏障 2 和 3 提供新鲜度保证：它们确保如果 B 在 A 之后运行，则读取 _complete 将计算为 true。”

我了解使用内存屏障如何影响指令记录，但是提到的这个“新鲜度保证”是什么？

在本文后面，还使用了以下示例：

static void Main()
{
    bool complete = false; 
    var t = new Thread (() =>
    {
        bool toggle = false;
        while (!complete) 
        {
           toggle = !toggle;
           // adding a call to Thread.MemoryBarrier() here fixes the problem
        }

    });

    t.Start();
    Thread.Sleep (1000);
    complete = true;
    t.Join();  // Blocks indefinitely
}

该示例后面有这样的解释：

“该程序永远不会终止，因为完整的变量缓存在 CPU 寄存器中。在 while 循环内插入对 Thread.MemoryBarrier 的调用（或锁定读取完成）可修复错误。”

再说一次......这里发生了什么？

原文

I have a question about the following code sample (taken from: http://www.albahari.com/threading/part4.aspx#_NonBlockingSynch)

class Foo
{
   int _answer;
   bool _complete;

   void A()
   {
       _answer = 123;
       Thread.MemoryBarrier();    // Barrier 1
       _complete = true;
       Thread.MemoryBarrier();    // Barrier 2
   }

    void B()
    {
       Thread.MemoryBarrier();    // Barrier 3
       if (_complete)
       {  
          Thread.MemoryBarrier(); // Barrier 4
          Console.WriteLine (_answer);
       }
    }
 }

This is followed with the following explantion:

"Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true."

I understand how using the memory barriers affect the instruction reording, but what is this "freshness gurarantee" that is mentioned?

Later in the article, the following example is also used:

static void Main()
{
    bool complete = false; 
    var t = new Thread (() =>
    {
        bool toggle = false;
        while (!complete) 
        {
           toggle = !toggle;
           // adding a call to Thread.MemoryBarrier() here fixes the problem
        }

    });

    t.Start();
    Thread.Sleep (1000);
    complete = true;
    t.Join();  // Blocks indefinitely
}

This example is followed with this explanation:

"This program never terminates because the complete variable is cached in a CPU register. Inserting a call to Thread.MemoryBarrier inside the while-loop (or locking around reading complete) fixes the error."

So again ... what happens here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满意归宿 2024-08-19 01:26:06

在第一种情况下，屏障 1 确保 _answer 是在 _complete 之前写入的。无论代码如何编写，或者编译器或 CLR 如何指示 CPU，内存总线读/写队列都可以对请求重新排序。 Barrier 基本上说的是“在继续之前先清空队列”。同样，Barrier 4 确保在 _complete 之后读取 _answer。否则，CPU2 可能会重新排序，并看到旧的 _answer 和“新”_complete。

从某种意义上说，障碍 2 和 3 是没有用的。请注意，解释中包含“after”一词：即“...如果 B 在 A 之后运行，...”。 B追赶A意味着什么？如果B和A在同一个CPU上，那么当然，B可以在后面。但在这种情况下，相同的 CPU 意味着没有内存屏障问题。

因此考虑 B 和 A 运行在不同的 CPU 上。现在，与爱因斯坦的相对论非常相似，比较不同位置/CPU 的时间的概念实际上没有意义。
另一种思考方式——你能编写代码来判断 B 是否追随 A 吗？如果是这样，那么您可能使用了内存屏障来做到这一点。否则，你无法分辨，问也没有意义。它也类似于海森堡原理——如果你能观察到它，你就修改了实验。

但是，抛开物理学不谈，假设您可以打开机器的引擎盖，并且看到 _complete 的实际内存位置是正确的（因为 A 已经运行）。现在运行 B。如果没有屏障 3，CPU2 可能仍然不会将 _complete 视为 true。即不“新鲜”。

但您可能无法打开计算机并查看_complete。也不会将您的发现传达给 CPU2 上的 B。您唯一的通信就是 CPU 本身正在执行的操作。因此，如果他们无法在没有障碍的情况下确定 BEFORE/AFTER，那么询问“如果 B 在没有障碍的情况下在 A 之后运行，会发生什么”毫无意义。

顺便说一句，我不确定您在 C# 中可以使用什么，但通常会做什么，以及代码示例 #1 真正需要的是写入时的单个释放屏障和读取时的单个获取屏障：

void A()
{
   _answer = 123;
   WriteWithReleaseBarrier(_complete, true);  // "publish" values
}

void B()
{
   if (ReadWithAcquire(_complete))  // subscribe
   {  
      Console.WriteLine (_answer);
   }
}

这个词“订阅”不常用于描述这种情况，但“发布”却可以。我建议您阅读 Herb Sutter 关于线程的文章。

这将障碍放置在完全正确的位置。

对于代码示例#2，这并不是真正的内存屏障问题，而是编译器优化问题 - 它在寄存器中保持完整。内存屏障会强制它退出，就像易失性一样，但调用外部函数可能也会如此 - 如果编译器无法判断该外部函数是否修改完整 ，它会从内存中重新读取它。即可能将 complete 的地址传递给某个函数（在编译器无法检查其详细信息的地方定义）：

while (!complete)
{
   some_external_function(&complete);
}

即使该函数不修改 complete，如果编译器不确定，它需要重新加载其寄存器。

即代码1和代码2之间的区别在于，代码1仅当A和B在单独的线程上运行时才会出现问题。即使在单线程机器上，代码 2 也可能出现问题。

实际上，另一个问题是 - 编译器可以完全删除 while 循环吗？如果它认为其他代码无法访问 complete ，为什么不呢？即，如果它决定将complete移动到寄存器中，它也可能完全删除循环。

编辑：回答来自 opc 的评论（我的答案对于评论块来说太大）：

屏障 3 强制 CPU 刷新任何挂起的读（和写）请求。

因此，想象一下，如果在读取 _complete 之前还有一些其他读取：

void B {}
{
   int x = a * b + c * d; // read a,b,c,d
   Thread.MemoryBarrier();    // Barrier 3
   if (_complete)
   ...

如果没有屏障，CPU 可能会将所有这 5 个读取请求“待处理”：

a,b,c,d,_complete

如果没有屏障，处理器可以重新排序这些请求以优化内存访问（即，如果 _complete 和 ' a' 位于同一缓存行或其他位置）。

通过屏障，CPU 在 _complete 甚至作为请求放入之前从内存中获取 a、b、c、d。确保“b”（例如）在 _complete 之前被读取 - 即不重新排序。

问题是——这有什么区别？

如果 a,b,c,d 独立于 _complete，那么就没有关系。屏障的作用就是减慢速度。所以是的，_complete 是稍后读取的。因此数据更新鲜。在读取之前放置一个 sleep(100) 或一些忙等待 for 循环也会使其“更新鲜”！ :-)

所以重点是 - 保持相对性。数据是否需要相对于其他数据之前/之后读取/写入？这就是问题所在。

并且不要贬低文章的作者 - 他确实提到“如果 B 追赶 A...”。只是不清楚他是否认为 A 之后的 B 对代码至关重要，可以通过代码观察，或者只是无关紧要。

In the first case, Barrier 1 ensures _answer is written BEFORE _complete. Regardless of how the code is written, or how the compiler or CLR instructs the CPU, the memory bus read/write queues can reorder the requests. The Barrier basically says "flush the queue before continuing". Similarly, Barrier 4 makes sure _answer is read AFTER _complete. Otherwise CPU2 could reorder things and see an old _answer with a "new" _complete.

Barriers 2 and 3 are, in some sense, useless. Note that the explanation contains the word "after": ie "... if B ran after A, ...". What's it mean for B to run after A? If B and A are on the same CPU, then sure, B can be after. But in that case, same CPU means no memory barrier problems.

So consider B and A running on different CPUs. Now, very much like Einstein's relativity, the concept of comparing times at different locations/CPUs doesn't really make sense.
Another way of thinking about it - can you write code that can tell whether B ran after A? If so, well you probably used memory barriers to do that. Otherwise, you can't tell, and it doesn't make sense to ask. It's also similar to Heisenburg's Principle - if you can observe it, you've modified the experiment.

But leaving physics aside, let's say you could open the hood of your machine, and see that the actually memory location of _complete was true (because A had run). Now run B. without Barrier 3, CPU2 might STILL NOT see _complete as true. ie not "fresh".

But you probably can't open your machine and look at _complete. Nor communicate your findings to B on CPU2. Your only communication is what the CPUs themselves are doing. So if they can't determine BEFORE/AFTER without barriers, asking "what happens to B if it runs after A, without barriers" makes no sense.

By the way, I'm not sure what you have available in C#, but what is typically done, and what is really needed for Code sample # 1 is a single release barrier on write, and a single acquire barrier on read:

void A()
{
   _answer = 123;
   WriteWithReleaseBarrier(_complete, true);  // "publish" values
}

void B()
{
   if (ReadWithAcquire(_complete))  // subscribe
   {  
      Console.WriteLine (_answer);
   }
}

The word "subscribe" isn't often used to describe the situation, but "publish" is. I suggest you read Herb Sutter's articles on threading.

This puts the barriers in exactly the right places.

For Code sample #2, this isn't really a memory barrier problem, it is a compiler optimization issue - it is keeping complete in a register. A memory barrier would force it out, as would volatile, but probably so would calling an external function - if the compiler can't tell whether that external function modified complete or not, it will re-read it from memory. ie maybe pass the address of complete to some function (defined somewhere where the compiler can't examine its details):

while (!complete)
{
   some_external_function(&complete);
}

even if the function doesn't modify complete, if the compiler isn't sure, it will need to reload its registers.

ie the difference between code 1 and code 2 is that code 1 only has problems when A and B are running on separate threads. code 2 could have problems even on a single threaded machine.

Actually, the other question would be - can the compiler completely remove the while loop? If it thinks complete is unreachable by other code, why not? ie if it decided to move complete into a register, it might as well remove the loop completely.

EDIT: To answer the comment from opc (my answer is too big for comment block):

Barrier 3 forces the CPU to flush any pending read (and write) requests.

So imagine if there was some other reads before reading _complete:

void B {}
{
   int x = a * b + c * d; // read a,b,c,d
   Thread.MemoryBarrier();    // Barrier 3
   if (_complete)
   ...

Without the barrier, the CPU might have all of these 5 read requests 'pending':

a,b,c,d,_complete

Without the barrier, the processor could reorder these requests to optimize memory access (ie if _complete and 'a' were on the same cache line or something).

With the barrier, the CPU gets a,b,c,d back from memory BEFORE _complete is even put in as a request. ENSURING 'b' (for example) is read BEFORE _complete - ie no reordering.

The question is - what difference does it make?

If a,b,c,d are independent from _complete, then it doesn't matter. All the barrier does is SLOW THINGS DOWN. So yeah, _complete is read later. So the data is fresher. Putting a sleep(100) or some busy-wait for-loop in there before the read would make it 'fresher' as well! :-)

So the point is - keep it relative. Does the data need to be read/written BEFORE/AFTER relative to some other data or not? That's the question.

And to not put down the author of the article - he does mention "if B ran after A...". It just isn't exactly clear whether he is imagining that B after A is crucial to the code, observable by to code, or just inconsequential.

回复收藏 0 原文