“伪原子” C++ 中的操作

发布于 2024-08-31 16:29:34 字数 833 浏览 9 评论 0原文

所以我知道 C++ 中没有什么是原子的。但我试图弄清楚是否可以做出任何“伪原子”假设。原因是我想避免在一些简单的情况下使用互斥体，在这些情况下我只需要非常弱的保证。

1）假设我有全局定义的 volatile bool b，其中最初我设置为true。然后我启动一个执行循环的线程

while(b) doSomething();

同时，在另一个线程中，我执行 b=true。

我可以假设第一个线程将继续执行吗？换句话说，如果 b 开始为 true，并且第一个线程在第二个线程分配 b=true 的同时检查 b 的值，我是否可以假设第一个线程会将 b 的值读取为 true？或者是否有可能在赋值 b=true 的某个中间点，b 的值可能被读取为 false？

2) 现在假设 b 最初为假。然后第一个线程执行

bool b1=b;
bool b2=b;
if(b1 && !b2) bad();

，而第二个线程执行 b=true。我可以假设 bad() 永远不会被调用吗？

3) int 或其他内置类型怎么样：假设我有 volatile int i，它最初是（比如说）7，然后我分配 i=7。我是否可以假设，在此操作期间的任何时间，从任何线程， i 的值将等于 7？

4）我有 volatile int i=7，然后我从某个线程执行 i++，而所有其他线程只读取 i 的值。我可以假设除了 7 或 8 之外，在任何线程中 i 都没有任何值吗？

5）我有 volatile int i，从一个线程执行 i=7，从另一个线程执行 i=8。之后，我是否保证为 7 或 8（或者我选择分配的任何两个值）？

原文

So I'm aware that nothing is atomic in C++. But I'm trying to figure out if there are any "pseudo-atomic" assumptions I can make. The reason is that I want to avoid using mutexes in some simple situations where I only need very weak guarantees.

1) Suppose I have globally defined volatile bool b, which
initially I set true. Then I launch a thread which executes a loop

while(b) doSomething();

Meanwhile, in another thread, I execute b=true.

Can I assume that the first thread will continue to execute? In other words, if b starts out as true, and the first thread checks the value of b at the same time as the second thread assigns b=true, can I assume that the first thread will read the value of b as true? Or is it possible that at some intermediate point of the assignment b=true, the value of b might be read as false?

2) Now suppose that b is initially false. Then the first thread executes

bool b1=b;
bool b2=b;
if(b1 && !b2) bad();

while the second thread executes b=true. Can I assume that bad() never gets called?

3) What about an int or other builtin types: suppose I have volatile int i, which is initially (say) 7, and then I assign i=7. Can I assume that, at any time during this operation, from any thread, the value of i will be equal to 7?

4) I have volatile int i=7, and then I execute i++ from some thread, and all other threads only read the value of i. Can I assume that i never has any value, in any thread, except for either 7 or 8?

5) I have volatile int i, from one thread I execute i=7, and from another I execute i=8. Afterwards, is i guaranteed to be either 7 or 8 (or whatever two values I have chosen to assign)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

街道布景 2024-09-07 16:29:34

标准 C++ 中没有线程，线程不能作为库。

因此，该标准对使用线程的程序的行为没有任何规定。您必须考虑线程实现提供的任何附加保证。

也就是说，在我使用的线程实现中：

（1）是的，您可以假设不相关的值不会写入变量。否则整个内存模型就会消失。但要小心，当你说“另一个线程”永远不会将 b 设置为 false 时，这意味着任何地方，永远。如果确实如此，则该写入可能会被重新排序以在循环期间发生。

(2) 不，编译器可以重新排序对 b1 和 b2 的赋值，因此有可能 b1 最终为 true，b2 最终为 false。在这样一个简单的情况下，我不知道为什么它会重新排序，但在更复杂的情况下可能有很好的理由。

[编辑：哎呀，当我回答 (2) 时，我忘记了 b 是不稳定的。从易失性变量中读取的内容不会被重新排序，抱歉，所以在典型的线程实现中（如果有这样的事情），您可以假设您不会最终得到 b1 true 和 b2 false。]

（ 3) 与 1 相同。一般来说，易失性与线程根本无关。然而，它在某些实现（Windows）中非常令人兴奋，并且实际上可能意味着内存障碍。

(4) 在 int 写入是原子的架构上是的，尽管 volatile 与之无关。另请参阅...

(5) 仔细检查文档。可能是的，但 volatile 是无关紧要的，因为在几乎所有体系结构上，int 写入都是原子的。但是，如果 int 写入不是原子的，那么就不会（对于上一个问题也不会），即使它是易失性的，原则上您也可以获得不同的值。不过，考虑到这些值 7 和 8，我们谈论的是一个非常奇怪的字节架构，其中包含要分两个阶段写入的相关位，但使用不同的值，您可能更可能获得部分写入。

举一个更合理的例子，假设由于某种奇怪的原因，您在一个只有 8 位写入是原子的平台上有一个 16 位 int。奇怪，但合法，而且由于 int 必须至少为 16 位，您可以看到它是如何发生的。进一步假设您的初始值为 255。那么增量可以合法地实现为：

旧值
读取寄存器中的
增量写入结果的最高有效字节
写入结果的最低有效字节。

在第三步和第四步之间中断递增线程的只读线程可以看到值 511。如果写入按其他顺序，它可以看到 0。

如果一个线程，则可能会永久留下不一致的值正在写入 255，另一个线程正在同时写入 256，并且写入会交错。在许多架构上这是不可能的，但要知道这种情况不会发生，您至少需要了解一些有关架构的信息。 C++ 标准中没有任何内容禁止它，因为 C++ 标准讨论了执行被信号中断，但没有执行被程序的另一部分中断的概念，也没有并发执行的概念。这就是为什么线程不仅仅是另一个库——添加线程从根本上改变了 C++ 执行模型。它要求实现以不同的方式执行操作，例如您最终会发现，如果您在 gcc 下使用线程并且忘记指定 -pthreads。

同样的情况也可能发生在对齐 int 写入是原子的平台上，但未对齐的int 写入是允许的，但不是原子的。例如 x86 上的 IIRC，如果未对齐的 int 写入跨越了缓存行边界，则不能保证它们是原子的。由于这个原因和其他原因，x86 编译器不会错误对齐声明的 int 变量。但如果你玩带有结构打包的游戏，你可能会引发一个例子。

所以：几乎任何实现都会为您提供所需的保证，但可能会以相当复杂的方式实现。

一般来说，我发现不值得尝试依赖于我不完全理解的特定于平台的内存访问保证来避免互斥。使用互斥体，如果太慢，请使用由真正了解体系结构和编译器的人编写的高质量无锁结构（或实现一个设计）。它可能是正确的，并且在正确性的前提下，它可能会胜过我自己发明的任何东西。

There are no threads in standard C++, and Threads cannot be implemented as a library.

Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.

That said, in threading implementations I've used:

(1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.

(2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.

[Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]

(3) same as 1. volatile in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.

(4) on an architecture where int writes are atomic yes, although volatile has nothing to do with it. See also...

(5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int writes are atomic. But if int write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.

For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:

read the old value
increment in a register
write the most significant byte of the result
write the least significant byte of the result.

A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.

An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads.

The same could happen on a platform where aligned int writes are atomic, but unaligned int writes are permitted and not atomic. For example IIRC on x86, unaligned int writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.

So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.

In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.

回复收藏 0 原文

暮光沉寂 2024-09-07 16:29:34

大多数答案都正确解决了您将遇到的 CPU 内存排序问题，但没有详细说明编译器如何通过以打破您的假设的方式重新排序代码来阻止您的意图。

考虑取自这篇文章：

volatile int ready;       
int message[100];      

void foo(int i) 
{      
    message[i/10] = 42;      
    ready = 1;      
}

在 -O2 及以上版本，最新版本的 GCC 和 Intel C/C++（不知道 VC++）将使存储做好准备 首先，因此它可以与 i/10 的计算重叠（易失性 不会拯救你！）：

    leaq    _message(%rip), %rax
    movl    $1, _ready(%rip)      ; <-- whoa Nelly!
    movq    %rsp, %rbp
    sarl    $2, %edx
    subl    %edi, %edx
    movslq  %edx,%rdx
    movl    $42, (%rax,%rdx,4)

这不是一个错误，而是优化器利用 CPU 流水线。如果另一个线程在访问 message 的内容之前正在等待 ready，那么您将面临一场令人讨厌且模糊的竞争。

使用编译器障碍来确保您的意图得到尊重。另一个利用 x86 相对较强的排序的示例是 Dmitriy Vyukov 的单生产者单消费者队列中的发布/消费包装器

// load with 'consume' (data-dependent) memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
T load_consume(T const* addr) 
{  
  T v = *const_cast<T const volatile*>(addr); 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  return v; 
} 

// store with 'release' memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
void store_release(T* addr, T v) 
{ 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  *const_cast<T volatile*>(addr) = v; 
}

我建议，如果您打算冒险进入并发内存访问领域，请使用一个可以处理这些问题的库为您提供详细信息。当我们都在等待 n2145和 std::atomic 查看线程构建块的 tbb::atomic或即将推出的 boost::atomic。

除了正确性之外，这些库还可以简化您的代码并阐明您的意图：

// thread 1
std::atomic<int> foo;  // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);

// thread 2
int tmp = foo.load(std::memory_order_acquire);

使用显式内存排序，foo 的线程间关系很清晰。

Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.

Consider an example taken from this post:

volatile int ready;       
int message[100];      

void foo(int i) 
{      
    message[i/10] = 42;      
    ready = 1;      
}

At -O2 and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready first, so it can be overlapped with computation of i/10 (volatile does not save you!):

    leaq    _message(%rip), %rax
    movl    $1, _ready(%rip)      ; <-- whoa Nelly!
    movq    %rsp, %rbp
    sarl    $2, %edx
    subl    %edi, %edx
    movslq  %edx,%rdx
    movl    $42, (%rax,%rdx,4)

This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready before accessing the contents of message then you have a nasty and obscure race.

Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:

// load with 'consume' (data-dependent) memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
T load_consume(T const* addr) 
{  
  T v = *const_cast<T const volatile*>(addr); 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  return v; 
} 

// store with 'release' memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
void store_release(T* addr, T v) 
{ 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  *const_cast<T volatile*>(addr) = v; 
}

I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.

Besides correctness, these libraries can simplify your code and clarify your intent:

// thread 1
std::atomic<int> foo;  // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);

// thread 2
int tmp = foo.load(std::memory_order_acquire);

Using explicit memory ordering, foo's inter-thread relationship is clear.

回复收藏 0 原文