C＆＃x2b;＆＃x2B;

发布于 2025-02-13 17:36:56 字数 2780 浏览 0 评论 0原文

我已经阅读了有关std :: memory_order在C ++中的信息，并部分理解。但是我仍然对此有一些疑问。

no读取或写入。这是否意味着编译器和CPU不允许移动获取语句下方的任何指令？

auto y = x.load(std::memory_order_acquire);
z = a;  // is it leagal to execute loading of shared `b` above acquire? (I feel no)
b = 2;  // is it leagal to execute storing of shared `a` above acquire? (I feel yes)

我可以理解为什么在获取之前执行加载是非法的。但是，为什么这对商店来说是非法的呢？

从aromic对象跳过无用的加载或存储是非法的吗？由于它们不是挥发性，而且据我所知，只有挥发性才有此要求。

auto y = x.load(std::memory_order_acquire);  // `y` is never used
return;

即使使用放松内存顺序，这种优化也不会发生。

是否允许编译器移动上面存在的指令获取语句，下面是吗？

z = a;  // is it leagal to execute loading of shared `b` below acquire? (I feel yes)
b = 2;  // is it leagal to execute storing of shared `a` below acquire? (I feel yes)
auto y = x.load(std::memory_order_acquire);

可以在不交叉获得边界的情况下重新排序两个负载或商店吗？

auto y = x.load(std::memory_order_acquire);
a = p;  // can this move below the below line?
b = q;  // shared `a` and `b`

与发行语义相似和相应的4个疑问。

与第二和第三个问题有关，为什么没有编译器是优化f（f（），，如下代码中的g（）具有积极性吗？

#include <atomic>

int a, b;

void dummy(int*);

void f(std::atomic<int> &x) {
    int z;
    z = a;  // loading shared `a` before acquire
    b = 2;  // storing shared `b` before acquire
    auto y = x.load(std::memory_order_acquire);
    z = a;  // loading shared `a` after acquire
    b = 2;  // storing shared `b` after acquire
    dummy(&z);
}

void g(int &x) {
    int z;
    z = a;
    b = 2;
    auto y = x;
    z = a;
    b = 2;
    dummy(&z);
}

f(std::atomic<int>&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        mov     DWORD PTR [rsp+12], eax
        mov     eax, DWORD PTR [rdi]
        lea     rdi, [rsp+12]
        mov     DWORD PTR b[rip], 2
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
g(int&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
b:
        .zero   4
a:
        .zero   4

原文

I have read about std::memory_order in C++ and understood partially. But I still had some doubts around it.

Explanation on std::memory_order_acquire says that, no reads or writes in the current thread can be reordered before this load. Does that mean compiler and cpu is not allowed to move any instruction present below the acquire statement, above it?

auto y = x.load(std::memory_order_acquire);
z = a;  // is it leagal to execute loading of shared `b` above acquire? (I feel no)
b = 2;  // is it leagal to execute storing of shared `a` above acquire? (I feel yes)

I can reason out why it is illegal for executing loads before acquire. But why it is illegal for stores?

Is it illegal to skip a useless load or store from atomic objects? Since they are not volatile, and as I know only volatile has this requirement.

auto y = x.load(std::memory_order_acquire);  // `y` is never used
return;

This optimization is not happening even with relaxed memory order.

Is compiler allowed to move instructions present above acquire statement, below it?

z = a;  // is it leagal to execute loading of shared `b` below acquire? (I feel yes)
b = 2;  // is it leagal to execute storing of shared `a` below acquire? (I feel yes)
auto y = x.load(std::memory_order_acquire);

Can two loads or stores be reordered without crossing acquire boundary?

auto y = x.load(std::memory_order_acquire);
a = p;  // can this move below the below line?
b = q;  // shared `a` and `b`

Similar and corresponding 4 doubts with release semantics also.

Related to 2nd and 3rd question, why no compiler is optimizing f(), as aggressive as g() in below code?

#include <atomic>

int a, b;

void dummy(int*);

void f(std::atomic<int> &x) {
    int z;
    z = a;  // loading shared `a` before acquire
    b = 2;  // storing shared `b` before acquire
    auto y = x.load(std::memory_order_acquire);
    z = a;  // loading shared `a` after acquire
    b = 2;  // storing shared `b` after acquire
    dummy(&z);
}

void g(int &x) {
    int z;
    z = a;
    b = 2;
    auto y = x;
    z = a;
    b = 2;
    dummy(&z);
}

f(std::atomic<int>&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        mov     DWORD PTR [rsp+12], eax
        mov     eax, DWORD PTR [rdi]
        lea     rdi, [rsp+12]
        mov     DWORD PTR b[rip], 2
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
g(int&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
b:
        .zero   4
a:
        .zero   4

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

所谓喜欢 2025-02-20 17:36:56

Q1

通常，是的。任何负载或Store （按程序顺序）收购负载，在其之前不得可见。

这是一个重要的示例：

#include <atomic>
#include <thread>
#include <iostream>

std::atomic<int> x{0};
std::atomic<bool> finished{false};
int xval;
bool good;

void reader() {
    xval = x.load(std::memory_order_relaxed);
    finished.store(true, std::memory_order_release);
}

void writer() {
    good = finished.load(std::memory_order_acquire);
    x.store(42, std::memory_order_relaxed);
}

int main() {
    std::thread t1(reader);
    std::thread t2(writer);
    t1.join();
    t2.join();
    if (good) {
        std::cout << xval << std::endl;
    } else {
        std::cout << "too soon" << std::endl;
    }
    return 0;
}

尝试使用Godbolt

此程序无UB，必须打印<<代码> 0 或太早。如果Writer在完成的加载之前，可以重新排序42至x的存储，那么Reader < /code> x返回42和writer 完成返回 true 的加载，在这种情况下，程序会不当打印42。

Q2

是的，编译器可以删除从未使用的值的原子负载，因为没有办法适应程序来检测负载是否完成。但是，当前的编译器通常不进行此类优化。部分出于谨慎的态度，因为对原子的优化很难正确，并且错误可能非常微妙。这也可能部分是支持程序员编写与实现相关的代码，该代码能够通过非标准功能检测到负载是否完成。

Q3

是的，这种重新排序是完全合法的，现实世界的架构将做到这一点。获取障碍只是一种方式。

Q4

是的，这也是合法的。如果a，b不是原子，而其他一些线程正在同时读取它们，则代码具有数据竞赛，并且是UB，因此，如果其他线程观察到在此中发生的写入，则可以错误的订单（或召唤鼻恶魔）。（如果它们是原子质，并且您正在做放松的商店，那么您将无法获得鼻恶魔，但是您仍然可以从订单中观察商店；在关系之前没有发生相反的关系。）

优化比较

您的f vers g示例并不是一个公平的比较：在g中，非原子变量的加载x没有副作用及其值不使用，因此编译器完全省略了它。如上所述，编译器不会忽略x在f中的不必要的原子负载。

关于为什么编译器不陷入a和b之后的a> b 的原因：我相信这只是错过的优化。同样，大多数编译器故意不会尝试通过原子来进行所有可能的法律优化。但是，您可以注意到，在ARM64上，x的负载 f 编译为ldar，CPU肯定可以通过早期重新排序普通载荷和存储

Q1

Generally, yes. Any load or store that follows (in program order) an acquire load, must not become visible before it.

Here is an example where it matters:

#include <atomic>
#include <thread>
#include <iostream>

std::atomic<int> x{0};
std::atomic<bool> finished{false};
int xval;
bool good;

void reader() {
    xval = x.load(std::memory_order_relaxed);
    finished.store(true, std::memory_order_release);
}

void writer() {
    good = finished.load(std::memory_order_acquire);
    x.store(42, std::memory_order_relaxed);
}

int main() {
    std::thread t1(reader);
    std::thread t2(writer);
    t1.join();
    t2.join();
    if (good) {
        std::cout << xval << std::endl;
    } else {
        std::cout << "too soon" << std::endl;
    }
    return 0;
}

Try on godbolt

This program is free of UB and must print either 0 or too soon. If the writer store of 42 to x could be reordered before the load of finished, then it would be possible that the reader load of x returns 42 and the writer load of finished returns true, in which case the program would improperly print 42.

Q2

Yes, it would be okay for a compiler to delete the atomic load whose value is never used, since there is no way for a conforming program to detect whether the load was done or not. However, current compilers generally don't do such optimizations. Partly out of an abundance of caution, because optimizations on atomics are hard to get right and bugs can be very subtle. It may also be partly to support programmers writing implementation-dependent code, that is able to detect via non-standard features whether the load was done.

Q3

Yes, this reordering is perfectly legal, and real-world architectures will do it. An acquire barrier is only one way.

Q4

Yes, this is also legal. If a,b are not atomic, and some other thread is reading them concurrently, then the code has a data race and is UB, so it is okay if the other thread observes the writes having happened in the wrong order (or summons nasal demons). (If they are atomic and you are doing relaxed stores, then you can't get nasal demons, but you can still observe the stores out of order; there is no happens-before relationship mandating the contrary.)

Optimization comparison

Your f versus g examples is not really a fair comparison: in g, the load of the non-atomic variable x has no side effects and its value is not used, so the compiler omits it altogether. As mentioned above, the compiler doesn't omit the unnecessary atomic load of x in f.

As to why the compilers don't sink the first accesses to a and b past the acquire load: I believe it's simply a missed optimization. Again, most compilers deliberately don't try to do all possible legal optimizations with atomics. However, you could note that on ARM64 for instance, the load of x in f compiles to ldar, which the CPU can definitely reorder with earlier plain loads and stores

回复收藏 0 原文

~没有更多了~

关于作者

等你爱我

暂无简介

文章

572 人气

关注发私信

友情链接

文江博客

C＆＃x2b;＆＃x2B;

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

Q1

Q2

Q3

Q4

优化比较

Q1

Q2

Q3

Q4

Optimization comparison

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

C＆＃x2b;＆＃x2B;

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

Q1

Q2

Q3

Q4

优化比较

Q1

Q2

Q3

Q4

Optimization comparison

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。