在C＆＃x2B;＆＃x2b;中使用乐观的锁和内存顺序

发布于 2025-01-27 20:27:52 字数 1737 浏览 2 评论 0 原文

我想知道如何“乐观的锁”（如下所述：）可以用于非原子数据以及应使用哪些内存顺序的C ++。

我从上面的论文中理解的是，以下代码将同步 t1 和 t2 和 t1 将打印两个变量更新或无。

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load();
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load();
        auto local_data2 = data2.load();
        if (s == versioned_lock.load()) {
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load();
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit);
        data1.store(1234);
        data2.store(4321);
        versioned_lock.store(current_lock_version);
    });

    t1.join();
    t2.join();
}

我的问题是：

我这是没有UB的有效C ++代码？
我对同步的假设是否正确？ T1 实际上会打印“ 0 0”或“ 1234 4321”？
应该将哪些内存订单用于阅读/写作 versioned_lock ？它应该是 memory_order_seq_cst 还是不那么限制的东西？如果 data1 和 data2 是非原子的（只有 int ，甚至一些更复杂的数据类型），它是否会影响所需的必需品，它也可以正常工作。内存顺序（或者可能需要Atomic_thread_fence）？

原文

I'm wondering how 'optimistic locks' (as described e.g. here: https://db.in.tum.de/~leis/papers/artsync.pdf) can be used in C++ with regard to non-atomic data and what memory orders should be used.

My understanding from the above paper is that following code will synchronize t1 and t2 and t1 will either print both variables updated or none.

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load();
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load();
        auto local_data2 = data2.load();
        if (s == versioned_lock.load()) {
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load();
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit);
        data1.store(1234);
        data2.store(4321);
        versioned_lock.store(current_lock_version);
    });

    t1.join();
    t2.join();
}

My questions are:

I this a valid C++ code with no UB?
Are my assumption about synchronization correct? Will t1 actually print either "0 0" or "1234 4321"?
What memory orders should be used for reading/writing to versioned_lock? Should it be memory_order_seq_cst or can it be something less restrictive? Will it work also if data1 and data2 are non-atomic (just int or even some more complex data type) and does it affect the required memory order (or maybe there is a need for atomic_thread_fence)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最美不过初阳 2025-02-03 20:27:52

这是有效的C ++代码，没有UB？

我认为是这样。在这样的程序中获得UB的通常方法是数据竞赛，
但这只有在编写非原子变量时才会发生
同时，此程序中的每个共享变量都是原子。

我对同步的假设是否正确？ T1实际上会打印“ 0 0”或“ 1234 4321”？

是的。此版本中的所有负载和存储都是 seq_cst ，因此
以某种总顺序发生，我将使用＆lt;表示。令L1，L2表示两个负载
versioned_lock 在t1中，s1，s2 t2中的两个商店。

如果S1，S2均出现在L1之前，则将看到两个新值
我们打印 1234 4321 ，这很好。如果S1，S2都在L2之后发生，那么我们
打印 0 0 ，也很好。如果S1或S2发生在之间
L1和L2，然后L1和L2返回不同的值，我们不打印
任何东西，这再次很好。

剩下的唯一订购是S1＆LT； L1＆lt; L2＆lt; S2。在这种情况下L1
用锁定位集返回值，因此我们重新启动循环
而不是打印。

应使用哪些内存订单来读取/写作
版本ed_lock？应该是memory_order_seq_cst还是可以
限制性较小的东西？

i think 以下是足够的：

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load(std::memory_order_acquire); // L1
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load(std::memory_order_relaxed);
        auto local_data2 = data2.load(std::memory_order_relaxed);
        std::atomic_thread_fence(std::memory_order_acquire);    // AF
        if (s == versioned_lock.load(std::memory_order_relaxed)) { // L2
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load(std::memory_order_relaxed);
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit, std::memory_order_relaxed); // S1
        std::atomic_thread_fence(std::memory_order_release); // RF
        data1.store(1234, std::memory_order_relaxed);
        data2.store(4321, std::memory_order_relaxed);
        versioned_lock.store(current_lock_version, std::memory_order_release); // S2
    });

    t1.join();
    t2.join();
}

也就是说，访问 versioned_lock 清除并测试锁定的
位必须释放和获取，数据的存储必须是
在释放围栏之前，数据的负载随后是
获取栅栏。直观地，这应该防止数据访问从保护它们的锁访问之间泄漏。（您也可以制作数据的所有存储
释放然后放下释放栅栏，但这是不必要的
强：我们不在乎两个数据存储是否会重新排序
其他。如果我们有更多的数据元素，那将会更大。。

为了证明这有效，请注意，我们必须避免的两个结果是 1234 0 和 0 4321 因此，假设 data1 的负载返回 1234 （ data2 返回 4321 的情况是等效的）。由于这是由T2存储的值，因此释放围栏（RF）与获取围栏（AF）同步。现在，在RF之前对S1进行了测序，并且在L2之前对AF进行了测序，因此我们得出结论，S1发生在L2之前。因此L2无法返回0;它要么返回 locked_bit 或 1 。

如果L1返回与L2不同的值，那么我们将不打印。如果L1返回 locked_bit 我们也不会打印。其余的情况是L1和L2都返回1。在这种情况下，L1读取了S2编写的值，L1/S2被获取/释放，因此S2与L1同步。数据存储在S2之前进行了测序，L1在数据负载之前进行了测序，因此两个数据存储都发生在两个数据负载之前。因此，我们看到了新值，并打印 1234 4321 。

如果data1和data2是非原子的（只是int或
甚至一些更复杂的数据类型）？

不，那根本行不通。该算法中没有任何内容阻止
data1，data2 的存储和负载来自冲突；就是这样
当他们发生冲突时，我们对版本的锁的测试告诉我们不要
使用数据。但是，如果它们不是原子，那么矛盾的商店和
负载将是一场数据竞赛，会导致不确定的行为
是否数据。

Is this a valid C++ code with no UB?

I think so. The usual way to get UB in such a program is a data race,
but that can only happen when a non-atomic variable is written
concurrently, and every shared variable in this program is atomic.

Are my assumption about synchronization correct? Will t1 actually print either "0 0" or "1234 4321"?

Yes. All the loads and stores in this version are seq_cst, so they
occur in some total order, which I will use < to denote. Let L1, L2 denote the two loads of
versioned_lock in t1, and S1, S2 the two stores in t2.

If S1, S2 both occur before L1, then both the new values will be seen
and we print 1234 4321, which is fine. If S1, S2 both occur after L2 then we
print 0 0, which is also fine. If either S1 or S2 occurs in between
L1 and L2, then L1 and L2 return different values and we do not print
anything, which again is fine.

The only ordering that remains is S1 < L1 < L2 < S2. In this case L1
returns the value with the locked bit set, and so we restart the loop
instead of printing.

What memory orders should be used for reading/writing to
versioned_lock? Should it be memory_order_seq_cst or can it be
something less restrictive?

I think that the following suffices:

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load(std::memory_order_acquire); // L1
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load(std::memory_order_relaxed);
        auto local_data2 = data2.load(std::memory_order_relaxed);
        std::atomic_thread_fence(std::memory_order_acquire);    // AF
        if (s == versioned_lock.load(std::memory_order_relaxed)) { // L2
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load(std::memory_order_relaxed);
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit, std::memory_order_relaxed); // S1
        std::atomic_thread_fence(std::memory_order_release); // RF
        data1.store(1234, std::memory_order_relaxed);
        data2.store(4321, std::memory_order_relaxed);
        versioned_lock.store(current_lock_version, std::memory_order_release); // S2
    });

    t1.join();
    t2.join();
}

That is, the accesses to versioned_lock that clear and test the locked
bit must be release and acquire, the stores of the data must be
preceded by a release fence, and the loads of the data followed by an
acquire fence. Intuitively, this should keep the data accesses from leaking out from between the lock accesses that protect them. (You could also make all the stores of the data
be release and then drop the release fence, but that would be unnecessarily
strong: we do not care if the two data stores get reordered with each
other. It would make a bigger difference if we had many more data elements. The same applies to the option of making the data loads acquire.)

To prove that this works, notice that the two results we must avoid are 1234 0 and 0 4321. So suppose that the load of data1 returns 1234 (the case when data2 returns 4321 is equivalent). Since this is the value that was stored by t2, the release fence (RF) synchronizes with the acquire fence (AF). Now S1 is sequenced before RF, and AF is sequenced before L2, so we conclude that S1 happens before L2. Thus L2 cannot return 0; it either returns locked_bit or 1.

If L1 returns a different value from L2 then we will not print. If L1 returns locked_bit we also will not print. The one remaining case is that L1 and L2 both return 1. In this case, L1 has read the value written by S2, and L1/S2 are acquire/release, so S2 synchronizes with L1. The data stores are sequenced before S2, and L1 is sequenced before the data loads, so both data stores happen before both data loads. Therefore we see both the new values, and print 1234 4321.

Will it work also if data1 and data2 are non-atomic (just int or
even some more complex data type)?

No, that will not work at all. Nothing in this algorithm prevents the
stores and loads of data1, data2 from conflicting; it's just that
when they do conflict, our tests of the versioned lock tell us not to
use the data. But if they were not atomic, the conflicting stores and
loads would be a data race, causing undefined behavior whether we use
the data or not.