记忆栅栏如何影响“新鲜度”?数据?
我对以下代码示例有疑问(取自: http://www.albahari .com/threading/part4.aspx#_NonBlockingSynch)
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B()
{
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine (_answer);
}
}
}
接下来是以下解释:
“屏障 1 和 4 阻止此示例写入“0”。屏障 2 和 3 提供新鲜度保证:它们确保如果 B 在 A 之后运行,则读取 _complete 将计算为 true。”
我了解使用内存屏障如何影响指令记录,但是提到的这个“新鲜度保证”是什么?
在本文后面,还使用了以下示例:
static void Main()
{
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete)
{
toggle = !toggle;
// adding a call to Thread.MemoryBarrier() here fixes the problem
}
});
t.Start();
Thread.Sleep (1000);
complete = true;
t.Join(); // Blocks indefinitely
}
该示例后面有这样的解释:
“该程序永远不会终止,因为完整的变量缓存在 CPU 寄存器中。在 while 循环内插入对 Thread.MemoryBarrier 的调用(或锁定读取完成)可修复错误。”
再说一次......这里发生了什么?
I have a question about the following code sample (taken from: http://www.albahari.com/threading/part4.aspx#_NonBlockingSynch)
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B()
{
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine (_answer);
}
}
}
This is followed with the following explantion:
"Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true."
I understand how using the memory barriers affect the instruction reording, but what is this "freshness gurarantee" that is mentioned?
Later in the article, the following example is also used:
static void Main()
{
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete)
{
toggle = !toggle;
// adding a call to Thread.MemoryBarrier() here fixes the problem
}
});
t.Start();
Thread.Sleep (1000);
complete = true;
t.Join(); // Blocks indefinitely
}
This example is followed with this explanation:
"This program never terminates because the complete variable is cached in a CPU register. Inserting a call to Thread.MemoryBarrier inside the while-loop (or locking around reading complete) fixes the error."
So again ... what happens here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在第一种情况下,屏障 1 确保
_answer
是在_complete
之前写入的。无论代码如何编写,或者编译器或 CLR 如何指示 CPU,内存总线读/写队列都可以对请求重新排序。 Barrier 基本上说的是“在继续之前先清空队列”。同样,Barrier 4 确保在_complete
之后读取_answer
。否则,CPU2 可能会重新排序,并看到旧的_answer
和“新”_complete
。从某种意义上说,障碍 2 和 3 是没有用的。请注意,解释中包含“after”一词:即“...如果 B 在 A 之后运行,...”。 B追赶A意味着什么?如果B和A在同一个CPU上,那么当然,B可以在后面。但在这种情况下,相同的 CPU 意味着没有内存屏障问题。
因此考虑 B 和 A 运行在不同的 CPU 上。现在,与爱因斯坦的相对论非常相似,比较不同位置/CPU 的时间的概念实际上没有意义。
另一种思考方式——你能编写代码来判断 B 是否追随 A 吗?如果是这样,那么您可能使用了内存屏障来做到这一点。否则,你无法分辨,问也没有意义。它也类似于海森堡原理——如果你能观察到它,你就修改了实验。
但是,抛开物理学不谈,假设您可以打开机器的引擎盖,并且看到
_complete
的实际内存位置是正确的(因为 A 已经运行)。现在运行 B。如果没有屏障 3,CPU2 可能仍然不会将_complete
视为 true。即不“新鲜”。但您可能无法打开计算机并查看
_complete
。也不会将您的发现传达给 CPU2 上的 B。您唯一的通信就是 CPU 本身正在执行的操作。因此,如果他们无法在没有障碍的情况下确定 BEFORE/AFTER,那么询问“如果 B 在没有障碍的情况下在 A 之后运行,会发生什么”毫无意义。顺便说一句,我不确定您在 C# 中可以使用什么,但通常会做什么,以及代码示例 #1 真正需要的是写入时的单个释放屏障和读取时的单个获取屏障:
这个词“订阅”不常用于描述这种情况,但“发布”却可以。我建议您阅读 Herb Sutter 关于线程的文章。
这将障碍放置在完全正确的位置。
对于代码示例#2,这并不是真正的内存屏障问题,而是编译器优化问题 - 它在寄存器中保持
完整
。内存屏障会强制它退出,就像易失性
一样,但调用外部函数可能也会如此 - 如果编译器无法判断该外部函数是否修改完整
,它会从内存中重新读取它。即可能将complete
的地址传递给某个函数(在编译器无法检查其详细信息的地方定义):即使该函数不修改
complete
,如果编译器不确定,它需要重新加载其寄存器。即代码1和代码2之间的区别在于,代码1仅当A和B在单独的线程上运行时才会出现问题。即使在单线程机器上,代码 2 也可能出现问题。
实际上,另一个问题是 - 编译器可以完全删除 while 循环吗?如果它认为其他代码无法访问
complete
,为什么不呢?即,如果它决定将complete
移动到寄存器中,它也可能完全删除循环。编辑:回答来自 opc 的评论(我的答案对于评论块来说太大):
屏障 3 强制 CPU 刷新任何挂起的读(和写)请求。
因此,想象一下,如果在读取 _complete 之前还有一些其他读取:
如果没有屏障,CPU 可能会将所有这 5 个读取请求“待处理”:
如果没有屏障,处理器可以重新排序这些请求以优化内存访问(即,如果 _complete 和 ' a' 位于同一缓存行或其他位置)。
通过屏障,CPU 在 _complete 甚至作为请求放入之前从内存中获取 a、b、c、d。确保“b”(例如)在 _complete 之前被读取 - 即不重新排序。
问题是——这有什么区别?
如果 a,b,c,d 独立于 _complete,那么就没有关系。屏障的作用就是减慢速度。所以是的,
_complete
是稍后读取的。因此数据更新鲜。在读取之前放置一个 sleep(100) 或一些忙等待 for 循环也会使其“更新鲜”! :-)所以重点是 - 保持相对性。数据是否需要相对于其他数据之前/之后读取/写入?这就是问题所在。
并且不要贬低文章的作者 - 他确实提到“如果 B 追赶 A...”。只是不清楚他是否认为 A 之后的 B 对代码至关重要,可以通过代码观察,或者只是无关紧要。
In the first case, Barrier 1 ensures
_answer
is written BEFORE_complete
. Regardless of how the code is written, or how the compiler or CLR instructs the CPU, the memory bus read/write queues can reorder the requests. The Barrier basically says "flush the queue before continuing". Similarly, Barrier 4 makes sure_answer
is read AFTER_complete
. Otherwise CPU2 could reorder things and see an old_answer
with a "new"_complete
.Barriers 2 and 3 are, in some sense, useless. Note that the explanation contains the word "after": ie "... if B ran after A, ...". What's it mean for B to run after A? If B and A are on the same CPU, then sure, B can be after. But in that case, same CPU means no memory barrier problems.
So consider B and A running on different CPUs. Now, very much like Einstein's relativity, the concept of comparing times at different locations/CPUs doesn't really make sense.
Another way of thinking about it - can you write code that can tell whether B ran after A? If so, well you probably used memory barriers to do that. Otherwise, you can't tell, and it doesn't make sense to ask. It's also similar to Heisenburg's Principle - if you can observe it, you've modified the experiment.
But leaving physics aside, let's say you could open the hood of your machine, and see that the actually memory location of
_complete
was true (because A had run). Now run B. without Barrier 3, CPU2 might STILL NOT see_complete
as true. ie not "fresh".But you probably can't open your machine and look at
_complete
. Nor communicate your findings to B on CPU2. Your only communication is what the CPUs themselves are doing. So if they can't determine BEFORE/AFTER without barriers, asking "what happens to B if it runs after A, without barriers" makes no sense.By the way, I'm not sure what you have available in C#, but what is typically done, and what is really needed for Code sample # 1 is a single release barrier on write, and a single acquire barrier on read:
The word "subscribe" isn't often used to describe the situation, but "publish" is. I suggest you read Herb Sutter's articles on threading.
This puts the barriers in exactly the right places.
For Code sample #2, this isn't really a memory barrier problem, it is a compiler optimization issue - it is keeping
complete
in a register. A memory barrier would force it out, as wouldvolatile
, but probably so would calling an external function - if the compiler can't tell whether that external function modifiedcomplete
or not, it will re-read it from memory. ie maybe pass the address ofcomplete
to some function (defined somewhere where the compiler can't examine its details):even if the function doesn't modify
complete
, if the compiler isn't sure, it will need to reload its registers.ie the difference between code 1 and code 2 is that code 1 only has problems when A and B are running on separate threads. code 2 could have problems even on a single threaded machine.
Actually, the other question would be - can the compiler completely remove the while loop? If it thinks
complete
is unreachable by other code, why not? ie if it decided to movecomplete
into a register, it might as well remove the loop completely.EDIT: To answer the comment from opc (my answer is too big for comment block):
Barrier 3 forces the CPU to flush any pending read (and write) requests.
So imagine if there was some other reads before reading _complete:
Without the barrier, the CPU might have all of these 5 read requests 'pending':
Without the barrier, the processor could reorder these requests to optimize memory access (ie if _complete and 'a' were on the same cache line or something).
With the barrier, the CPU gets a,b,c,d back from memory BEFORE _complete is even put in as a request. ENSURING 'b' (for example) is read BEFORE _complete - ie no reordering.
The question is - what difference does it make?
If a,b,c,d are independent from _complete, then it doesn't matter. All the barrier does is SLOW THINGS DOWN. So yeah,
_complete
is read later. So the data is fresher. Putting a sleep(100) or some busy-wait for-loop in there before the read would make it 'fresher' as well! :-)So the point is - keep it relative. Does the data need to be read/written BEFORE/AFTER relative to some other data or not? That's the question.
And to not put down the author of the article - he does mention "if B ran after A...". It just isn't exactly clear whether he is imagining that B after A is crucial to the code, observable by to code, or just inconsequential.
代码示例#1:
每个处理器核心都包含一个缓存,其中包含一部分内存的副本。更新缓存可能需要一些时间。内存屏障保证缓存与主内存同步。例如,如果此处没有障碍 2 和 3,请考虑以下情况:
处理器 1 运行 A()。它将 _complete 的新值写入其缓存(但不一定写入主内存)。
处理器 2 运行 B()。它读取 _complete 的值。如果该值先前在其缓存中,则它可能不是最新的(即,未与主存储器同步),因此它不会获得更新的值。
代码示例#2:
通常,变量存储在内存中。但是,假设在单个函数中多次读取某个值:作为优化,编译器可能决定将其读入 CPU 寄存器一次,然后在每次需要时访问该寄存器。这要快得多,但会阻止函数检测另一个线程对变量的更改。
这里的内存屏障迫使函数从内存中重新读取变量值。
Code sample #1:
Each processor core contains a cache with a copy of a portion of memory. It may take a bit of time for the cache to be updated. The memory barriers guarantee that the caches are synchronized with main memory. For example, if you didn't have barriers 2 and 3 here, consider this situation:
Processor 1 runs A(). It writes the new value of _complete to its cache (but not necessarily to main memory yet).
Processor 2 runs B(). It reads the value of _complete. If this value was previously in its cache, it may not be fresh (i.e., not synchronized with main memory), so it would not get the updated value.
Code sample #2:
Normally, variables are stored in memory. However, suppose a value is read multiple times in a single function: As an optimization, the compiler may decide to read it into a CPU register once, and then access the register each time it is needed. This is much faster, but prevents the function from detecting changes to the variable from another thread.
The memory barrier here forces the function to re-read the variable value from memory.
调用 Thread.MemoryBarrier() 会立即使用变量的实际值刷新寄存器缓存。
在第一个示例中,
_complete
的“新鲜度”是通过在设置之后和使用之前调用该方法来提供的。在第二个示例中,变量complete
的初始false
值将缓存在线程自己的空间中,需要重新同步才能立即看到实际的“外部”来自正在运行的线程“内部”的值。Calling Thread.MemoryBarrier() immediately refreshes the register caches with the actual values for variables.
In the first example, the "freshness" for
_complete
is provided by calling the method right after setting it and right before using it. In the second example, the initialfalse
value for the variablecomplete
will be cached in the thread's own space and needs to be resynchronized in order to immediately see the actual "outside" value from "inside" the running thread.“新鲜度”保证仅仅意味着屏障 2 和 3 强制
_complete
的值尽快可见,而不是每当它们碰巧被写入内存时就可见。从一致性的角度来看,这实际上是不必要的,因为障碍 1 和 4 确保在读取
complete
后读取answer
。The "freshness" guarantee simply means that Barriers 2 and 3 force the values of
_complete
to be visible as soon as possible as opposed to whenever they happen to be written to memory.It's actually unnecessary from a consistency point of view, since Barriers 1 and 4 ensure that
answer
will be read after readingcomplete
.