当前位置：文江博客话题详情

C++0x 内存模型和推测加载/存储

发布于 2024-08-16 23:16:20 字数 2020 浏览 5 评论 0原文

因此，我正在阅读有关内存模型的内容，该模型是即将推出的 C++0x 标准的一部分。但是，我对编译器允许执行的操作的一些限制有点困惑，特别是关于推测加载和存储。

首先，一些相关的内容：

Hans Boehm 关于线程和C++0x 中的内存模型

Boehm，“线程无法作为库实现”

Boehm 和 Adve， “C++ 并发内存模型的基础”

Sutter，“Prism：用于 Microsoft 本机代码平台的基于原理的顺序内存模型”，N2197

Boehm，“并发内存模型编译器后果”，N2338

现在，基本思想本质上是“无数据竞争程序的顺序一致性”，其中似乎是易于编程和允许编译器和硬件优化之间的一个不错的折衷方案。如果不同线程对同一内存位置的两次访问未排序，则定义为发生数据竞争，其中至少一次访问存储到该内存位置，并且其中至少一次不是同步操作。这意味着对共享数据的所有读/写访问都必须通过某种同步机制，例如互斥体或对原子变量的操作（嗯，可以通过宽松的内存排序来对原子变量进行操作仅适用于专家，但默认提供顺序一致性）。

有鉴于此，我对普通共享变量上的虚假或推测加载/存储的限制感到困惑。例如，在 N2338 中，我们有一个

switch (y) {
    case 0: x = 17; w = 1; break;
    case 1: x = 17; w = 3; break;
    case 2: w = 9; break;
    case 3: x = 17; w = 1; break;
    case 4: x = 17; w = 3; break;
    case 5: x = 17; w = 9; break;
    default: x = 17; w = 42; break;
}

不允许编译器转换为的

tmp = x; x = 17;
switch (y) {
    case 0: w = 1; break;
    case 1: w = 3; break;
    case 2: x = tmp; w = 9; break;
    case 3: w = 1; break;
    case 4: w = 3; break;
    case 5: w = 9; break;
    default: w = 42; break;
}

示例，因为如果 y == 2，则存在对 x 的虚假写入，如果另一个线程同时更新 x，这可能会出现问题。但是，为什么这是一个问题呢？这是一场数据竞赛，无论如何都是被禁止的；在这种情况下，编译器通过两次写入 x 只会让情况变得更糟，但即使一次写入也足以引发数据竞争，不是吗？即，正确的 C++0x 程序需要同步对 x 的访问，在这种情况下将不再存在数据竞争，并且虚假存储也不会成为问题？

我同样对 N2197 中的示例 3.1.3 以及其他一些示例感到困惑，但也许对上述问题的解释也能解释这一点。

编辑：答案：

推测存储成为问题的原因是，在上面的 switch 语句示例中，程序员可能选择仅在 y != 2 时有条件地获取保护 x 的锁。因此推测存储可能会引入原始代码中不存在的数据竞争，因此禁止进行转换。同样的论点也适用于 N2197 中的示例 3.1.3。

原文

So I was reading about the memory model that is part of the upcoming C++0x standard. However, I'm a bit confused about some of the restrictions for what the compiler is allowed to do, specifically about speculative loads and stores.

To start with, some of the relevant stuff:

Hans Boehm's pages about threads and the memory model in C++0x

Boehm, "Threads Cannot be Implemented as a Library"

Boehm and Adve, "Foundations of the C++ Concurrency Memory Model"

Sutter, "Prism: A Principle-Based Sequential Memory Model for Microsoft Native Code Platforms", N2197

Boehm, "Concurrency memory model compiler consequences", N2338

Now, the basic idea is essentially "Sequential Consistency for Data-Race-Free Programs", which seems to be a decent compromise between ease of programming and allowing the compiler and hardware opportunities to optimize. A data race is defined to occur if two accesses to the same memory location by different threads are not ordered, at least one of them stores to the memory location, and at least one of them is not a synchronization action. It implies that all read/write access to shared data must be via some synchronization mechanism, such as mutexes or operations on atomic variables (well, it is possible to operate on the atomic variables with relaxed memory ordering for experts only, but the default provides for sequential consistency).

In light of this, I'm confused about the restrictions about spurious or speculative loads/stores on ordinary shared variables. For instance, in N2338 we have the example

switch (y) {
    case 0: x = 17; w = 1; break;
    case 1: x = 17; w = 3; break;
    case 2: w = 9; break;
    case 3: x = 17; w = 1; break;
    case 4: x = 17; w = 3; break;
    case 5: x = 17; w = 9; break;
    default: x = 17; w = 42; break;
}

which the compiler is not allowed to transform into

tmp = x; x = 17;
switch (y) {
    case 0: w = 1; break;
    case 1: w = 3; break;
    case 2: x = tmp; w = 9; break;
    case 3: w = 1; break;
    case 4: w = 3; break;
    case 5: w = 9; break;
    default: w = 42; break;
}

since if y == 2 there is a spurious write to x which could be a problem if another thread were concurrently updating x. But, why is this a problem? This a data race, which is prohibited anyway; in this case, the compiler just makes it worse by writing to x twice, but even a single write would be enough for a data race, no? I.e. a proper C++0x program would need to synchronize access to x, in which case there would no longer be data race, and the spurious store wouldn't be a problem either?

I'm similarly confused about Example 3.1.3 in N2197 and some of the other examples as well, but maybe an explanation for the above issue would explain that too.

EDIT: The Answer:

The reason why speculative stores are a problem is that in the switch statement example above, the programmer might have elected to conditionally acquire the lock protecting x only if y != 2. Hence the speculative store might introduce a data race that was not there in the original code, and the transformation is thus forbidden. The same argument applies to Example 3.1.3 in N2197 as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹿港小镇 2024-08-23 23:16:20

我不熟悉您提到的所有内容，但请注意，在 y==2 的情况下，在代码的第一位， x 根本没有被写入（或读取）。第二位代码中，写了两次。这比只编写一次与编写两次有更大的区别（至少在现有的线程模型（例如 pthreads）中）。此外，存储根本不会存储的值比仅存储一次与存储两次有更大的区别。出于这两个原因，您不希望编译器只是将无操作替换为 tmp = x; x = 17； x = tmp;。

假设线程 A 想要假设没有其他线程修改 x。如果 y 是 2，并且它向 x 写入一个值，然后将其读回，那么它会得到它写入的值，这是合理的。但是，如果线程 B 并发执行第二位代码，则线程 A 可以写入 x 并稍后读取它，并取回原始值，因为线程 B 在写入“之前”保存并在写入“之后”恢复。或者它可以返回 17，因为线程 B 在写入“之后”存储了 17，并在线程 A 读取“之后”再次存储了 tmp。线程 A 可以做任何它喜欢的同步，但这没有帮助，因为线程 B 没有同步。它不同步的原因（在 y==2 情况下）是因为它没有使用 x。因此，特定代码位是否“使用 x”的概念对于线程模型很重要，这意味着编译器不能在“不应该”时更改代码以使用 x。

简而言之，如果允许您提出的转换，引入虚假写入，那么永远不可能分析一点代码并得出它不会修改 x （或任何其他内存位置）的结论。有许多方便的习惯用法因此是不可能的，例如在线程之间共享不可变数据而不同步。

因此，尽管我不熟悉 C++0x 对“数据竞争”的定义，但我认为它包含一些条件，允许程序员假设对象未被写入，并且此转换将违反这些条件。我推测，如果 y==2，那么您的原始代码以及并发代码： x = 42; x = 1; z = x 在另一个线程中，未定义为数据竞争。或者至少如果这是一场数据竞争，则不允许 z 以值 17 或 42 结束。

考虑在该程序中，y 中的值 2 可能用于指示“还有其他线程正在运行” ：不要修改 x，因为我们在这里没有同步，所以这会引入数据竞争”。也许根本没有同步的原因是，在 y 的所有其他情况下，没有其他线程运行来访问 x。对我来说，C++0x 希望支持这样的代码似乎是合理的：

if (single_threaded) {
    x = 17;
} else {
    sendMessageThatSafelySetsXTo(17);
}

显然，您不希望将其转换为：

tmp = x;
x = 17;
if (!single_threaded) {
    x = tmp;
    sendMessageThatSafelySetsXTo(17);
}

这与您的示例中的转换基本上相同，但只有 2 种情况，而不是那里足以使其看起来像是一个很好的代码大小优化。

I'm not familiar with all the stuff you refer to, but notice that in the y==2 case, in the first bit of code, x is not written to at all (or read, for that matter). In the second bit of code, it is written twice. This is more of a difference than just writing once vs. writing twice (at least, it is in existing threading models such as pthreads). Also, storing a value which would not otherwise be stored at all is more of a difference than just storing once vs. storing twice. For both these reasons, you don't want compilers just replacing a no-op with tmp = x; x = 17; x = tmp;.

Suppose thread A wants to assume that no other thread modifies x. It's reasonable to want it to be allowed to expect that if y is 2, and it writes a value to x, then reads it back, it will get back the value it has written. But if thread B is concurrently executing your second bit of code, then thread A could write to x and later read it, and get back the original value, because thread B saved "before" the write and restored "after" it. Or it could get back 17, because thread B stored 17 "after" the write, and stored tmp back again "after" thread A reads. Thread A can do whatever synchronisation it likes, and it won't help, because thread B isn't synchronised. The reason it's not synchronised (in the y==2 case) is that it's not using x. So the concept of whether a particular bit of code "uses x" is important to the threading model, which means compilers can't be allowed to change code to use x when it "shouldn't".

In short, if the transformation you propose were allowed, introducing a spurious write, then it would never be possible to analyse a bit of code and conclude that it does not modify x (or any other memory location). There are a number of convenient idioms which would therefore be impossible, such as sharing immutable data between threads without synchronisation.

So, although I'm not familiar with C++0x's definition of "data race", I assume that it includes some conditions where programmers are allowed to assume that an object is not written to, and that this transformation would violate those conditions. I speculate that if y==2, then your original code, together with concurrent code: x = 42; x = 1; z = x in another thread, is not defined to be a data race. Or at least if it is a data race, it's not one which permits z to end up with value either 17, or 42.

Consider that in this program, the value 2 in y might be used to indicate, "there are other threads running: don't modify x, because we aren't synchronised here, so that would introduce a data race". Perhaps the reason there's no synchronisation at all, is that in all other cases of y, there are no other threads running with access to x. It seems reasonable to me that C++0x would want to support code like this:

if (single_threaded) {
    x = 17;
} else {
    sendMessageThatSafelySetsXTo(17);
}

Clearly then, you don't want that transformed to:

tmp = x;
x = 17;
if (!single_threaded) {
    x = tmp;
    sendMessageThatSafelySetsXTo(17);
}

Which is basically the same transformation as in your example, but with only 2 cases, instead of there being enough to make it look like a good code-size optimisation.

回复收藏 0 原文