C++0x 中的栅栏,一般只保证原子或内存

发布于 2024-10-30 09:46:58 字数 1510 浏览 2 评论 0原文

C++0x 草案有一个栅栏的概念,似乎与 CPU/芯片级别的栅栏概念非常不同,或者说一下 Linux 内核人员对栅栏的期望栅栏< /a>.问题是草案是否真的暗示了一种极其受限的模式,或者措辞很糟糕,实际上暗示了真正的围栏。

例如,在 29.8 Fences 下,它说明了以下内容:

释放栅栏 A 与 如果存在原子则获取栅栏B 操作 X 和 Y,均在 某个原子对象 M,使得 A 是 在 X 之前排序,X 修改 M,Y 是 在 B 之前测序,Y 读取 X 写入的值或写入的值 假设中的任何副作用 释放序列 X 将在头,如果 是一个发布操作。

它使用这些术语原子操作原子对象。草案中定义了这样的原子操作和方法,但是仅仅意味着这些吗? 释放围栏听起来像存储围栏。不能保证在栅栏之前写入所有数据的存储栅栏几乎是无用的。与加载(获取)栅栏和完整栅栏类似。

那么,C++0x 中的栅栏/屏障和措辞是否非常糟糕,或者它们是否如所描述的那样受到严格限制/无用?


就 C++ 而言,假设我有这样的现有代码(假设现在栅栏可用作高级构造——而不是说在 GCC 中使用 __sync_synchronize):

Thread A:
b = 9;
store_fence();
a = 5;

Thread B:
if( a == 5 )
{
  load_fence();
  c = b;
}

假设 a、b、c 的大小可以在平台上进行原子复制。上面的意思是 c 只会被分配到 9。请注意,我们并不关心线程 B 何时看到 a==5,只是当它看到 b==9 时,它也会看到 b==9

C++0x 中保证相同关系的代码是什么?


答案:如果您阅读我选择的答案和所有评论,您就会了解情况的要点。 C++0x 似乎强制您使用带有栅栏的原子,而普通的硬件栅栏没有此要求。在许多情况下,只要 sizeof(atomic) == sizeof(T)atomic.is_lock_free() == true,这仍然可以用来替换并发算法

然而不幸的是,is_lock_free 不是 constexpr。这将允许它在 static_assert 中使用。让atomic退化为使用锁通常是一个坏主意:与互斥体设计的算法相比,使用互斥体的原子算法会出现可怕的争用问题。

The C++0x draft has a notion of fences which seems very distinct from a CPU/chip level notion of fences, or say what the linux kernel guys expect of fences. The question is whether the draft really implies an extremely restricted model, or the wording is just poor and it actually implies true fences.

For example, under 29.8 Fences it states things like:

A release fence A synchronizes with an
acquire fence B if there exist atomic
operations X and Y, both operating on
some atomic object M, such that A is
sequenced before X, X modifies M, Y is
sequenced before B, and Y reads the
value written by X or a value written
by any side effect in the hypothetical
release sequence X would head if it
were a release operation.

It uses these terms atomic operations and atomic object. There are such atomic operations and methods defined in the draft, but does it mean only those? A release fence sounds like a store fence. A store fence that doesn't guarantee the write of all data prior to the fence is nearly useless. Similar for a load (acquire) fence and full fence.

So, are the fences/barries in the C++0x proper fences and the wording just incredibly poor, or are they exremely restricted/useless as described?


In terms of C++, say I have this existing code (assuming fences are available as high level constructs right now -- instead of say using __sync_synchronize in GCC):

Thread A:
b = 9;
store_fence();
a = 5;

Thread B:
if( a == 5 )
{
  load_fence();
  c = b;
}

Assume a,b,c are of a size to have atomic copy on the platform. The above means that c will only ever be assigned 9. Note we don't care when Thread B sees a==5, just that when it does it also sees b==9.

What is the code in C++0x that guarantees the same relationship?


ANSWER: If you read my chosen answer and all the comments you'll get the gist of the situation. C++0x appears to force you to use an atomic with fences whereas a normal hardware fence does not have this requirement. In many cases this can still be used to replace concurrent algorithms so long as sizeof(atomic<T>) == sizeof(T) and atomic<T>.is_lock_free() == true.

It is unfortunate however that is_lock_free is not a constexpr. That would allow it to be used in a static_assert. Having atomic<T> degenerate to using locks is generally a bad idea: atomic algorithms that use mutexes will have horrible contention problems compared to a mutex-designed algorithm.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

羁客 2024-11-06 09:46:58

栅栏提供所有数据排序。但是,为了保证一个线程的栅栏操作对第二个线程可见,您需要对标志使用原子操作,否则就会出现数据争用。

std::atomic<bool> ready(false);
int data=0;

void thread_1()
{
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
}

void thread_2()
{
    if(ready.load(std::memory_order_relaxed))
    {
        std::atomic_thread_fence(std::memory_order_acquire);
        std::cout<<"data="<<data<<std::endl;
    }
}

如果thread_2读取readytrue,则栅栏确保可以安全地读取data,并且输出将是data=42。如果 ready 被读取为 false,那么您无法保证 thread_1 已发出适当的栅栏,因此线程 2 中的栅栏仍然不会提供必要的排序保证——如果thread_2中的if被省略,对data的访问将是数据竞争和未定义的行为,即使有栅栏。

说明:std::atomic_thread_fence(std::memory_order_release) 通常相当于存储栅栏,并且很可能会这样实现。但是,一个处理器上的单个栅栏不能保证任何内存排序:您需要在第二个处理器上有一个相应的栅栏,并且您需要知道执行获取栅栏时释放栅栏的效果对第二个处理器可见。很明显,如果 CPU A 发出获取栅栏,然后 5 秒后 CPU B 发出释放栅栏,则该释放栅栏无法与获取栅栏同步。除非您有某种方法检查其他 CPU 上是否已发出栅栏,否则 CPU A 上的代码无法判断它是在 CPU B 上的栅栏之前还是之后发出栅栏。

要求使用原子操作来检查栅栏是否已被看到是数据竞争规则的结果:如果没有排序关系,您无法从多个线程访问非原子变量,因此您无法使用非原子变量来检查排序关系。

当然可以使用更强大的机制,例如互斥体,但这将使单独的栅栏变得毫无意义,因为互斥体将提供栅栏。

宽松的原子操作可能只是现代 CPU 上的普通加载和存储,尽管可能需要额外的对齐要求来确保原子性。

为使用特定于处理器的栅栏而编写的代码可以轻松更改为使用 C++0x 栅栏,前提是用于检查同步的操作(而不是用于访问同步数据的操作)是原子的。现有代码很可能依赖于给定 CPU 上普通加载和存储的原子性,但转换为 C++0x 将需要对这些检查使用原子操作,以便提供排序保证。

Fences provide ordering on all data. However, in order to guarantee that the fence operation from one thread is visible to a second, you need to use atomic operations for the flag, otherwise you have a data race.

std::atomic<bool> ready(false);
int data=0;

void thread_1()
{
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
}

void thread_2()
{
    if(ready.load(std::memory_order_relaxed))
    {
        std::atomic_thread_fence(std::memory_order_acquire);
        std::cout<<"data="<<data<<std::endl;
    }
}

If thread_2 reads ready to be true, then the fences ensure that data can safely be read, and the output will be data=42. If ready is read to be false, then you cannot guarantee that thread_1 has issued the appropriate fence, so a fence in thread 2 would still not provide the necessary ordering guarantees --- if the if in thread_2 was omitted, the access to data would be a data race and undefined behaviour, even with the fence.

Clarification: A std::atomic_thread_fence(std::memory_order_release) is generally equivalent to a store fence, and will likely be implemented as such. However, a single fence on one processor does not guarantee any memory ordering: you need a corresponding fence on a second processor, AND you need to know that when the acquire fence was executed the effects of the release fence were visible to that second processor. It is obvious that if CPU A issues an acquire fence, and then 5 seconds later CPU B issues a release fence, then that release fence cannot synchronize with the acquire fence. Unless you have some means of checking whether or not the fence has been issued on the other CPU, the code on CPU A cannot tell whether it issued its fence before or after the fence on CPU B.

The requirement that you use an atomic operation to check whether or not the fence has been seen is a consequence of the data race rules: you cannot access a non-atomic variable from multiple threads without an ordering relationship, so you cannot use a non-atomic variable to check for an ordering relationship.

A stronger mechanism such as a mutex can of course be used, but that would render the separate fence pointless, as the mutex would provide the fence.

Relaxed atomic operations are likely just plain loads and stores on modern CPUs, though possibly with additional alignment requirements to ensure atomicity.

Code written to use processor-specific fences can readily be changed to use C++0x fences, provided the operations used to check synchronization (rather than those used to access the synchronized data) are atomic. Existing code may well rely on the atomicity of plain loads and stores on a given CPU, but conversion to C++0x will require using atomic operations for those checks in order to provide the ordering guarantees.

别想她 2024-11-06 09:46:58

我的理解是它们是适当的栅栏。间接证据表明,毕竟,它们旨在映射到实际硬件中发现的功能,并允许有效实现同步算法。正如您所说,仅适用于某些特定值的栅栏 1. 无用,2. 在当前硬件上找不到。

话虽如此,您引用的 AFAICS 部分描述了栅栏和原子操作之间的“同步”关系。有关其含义的定义,请参阅第 1.10 节“多线程执行和数据竞争”。再次强调,AFAICS,这并不意味着栅栏仅适用于原子对象,而是我怀疑其含义是,虽然普通加载和存储可能以通常的方式(仅一个方向)传递获取和释放栅栏,但原子加载/商店可能不会。

沃特。原子对象,我的理解是,在 Linux 支持的所有目标上,正确对齐的普通整数变量,其 sizeof() <= sizeof(*void) 是原子的,因此 Linux 使用普通整数作为同步变量(即 Linux 内核原子操作)对普通整数变量进行操作)。 C++ 不想强加这样的限制,因此有单独的原子整数类型。另外,在 C++ 中,原子整数类型的操作意味着屏障,而在 Linux 内核中,所有屏障都是显式的(这一点很明显,因为编译器不支持原子类型,这是必须做的)。

My understanding is that they are proper fences. The circumstantial evidence being that, after all, they are meant to map to features found in actual hardware and which allows efficient implementation of synchronization algorithms. As you say, fences that apply only to some specific values are 1. useless and 2. not found on current hardware.

That being said, AFAICS the section you quote describes the "synchronizes-with" relationship between fences and atomic operations. For a definition of what this means, see section 1.10 Multi-threaded executions and data races. Again, AFAICS, this does not imply that the fences apply only to the atomic objects, but rather I suspect the meaning is that while ordinary loads and stores may pass acquire and release fences in the usual way (one direction only), atomic loads/stores may not.

Wrt. atomic objects, my understanding is that on all targets Linux supports, properly aligned plain integer variables whose sizeof() <= sizeof(*void) are atomic, hence Linux uses normal integers as synchronization variables (that is, the Linux kernel atomic operations operate on normal integer variables). C++ does not want to impose such a limitation, hence the separate atomic integer types. Also, in C++ operations on atomic integer types imply barriers, whereas in the Linux kernel all barriers are explicit (which is sort of obvious since without compiler support for atomic types that is what one must do).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文