是否需要互斥体来同步 pthread 之间的简单标志？

发布于 2024-12-01 23:15:05 字数 2199 浏览 0 评论 0原文

让我们想象一下，我有一些工作线程，如下所示：

while (1) {
    do_something();

    if (flag_isset())
        do_something_else();
}

我们有几个用于检查和设置标志的辅助函数：

void flag_set()   { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int  flag_isset() { return global_flag; }

因此，线程在繁忙循环中不断调用 do_something() ，以防万一其他一些线程设置了 global_flag，该线程还会调用 do_something_else()（例如，当通过从另一个线程设置标志来请求时，可以输出进度或调试信息）。

我的问题是：我需要做一些特殊的事情来同步对 global_flag 的访问吗？如果是，以便携式方式进行同步的最少工作到底是什么？

我试图通过阅读许多文章来解决这个问题，但我仍然不太确定正确的答案......我认为它是以下之一：

A：不需要同步，因为设置或清除标志不会创建竞争条件：

我们只需将标志定义为易失性以确保它确实是从每次检查时都会共享内存：

volatile int global_flag;

它可能不会立即传播到其他 CPU 核心，但迟早会的，保证。

B：需要完全同步以确保对标志的更改在线程之间传播：

在一个 CPU 核心中设置共享标志并不一定会让另一个核心看到它。我们需要使用互斥体来确保标志更改始终通过使其他 CPU 上的相应缓存线无效来传播。代码如下：

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset()
{
    int rc;
    pthread_mutex_lock(flag_mutex);
    rc = global_flag;
    pthread_mutex_unlock(flag_mutex);
    return rc;
}

C：需要同步以确保对标志的更改在线程之间传播：

这与 B 相同，但不是在两侧使用互斥锁（reader & writer）我们只将其设置在写作方面。因为逻辑上不需要同步。我们只需要在标志更改时进行同步（使其他缓存无效）：

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset() { return global_flag; }

这可以避免当我们知道标志很少更改时不断锁定和解锁互斥体。我们只是使用 Pthreads 互斥体的副作用来确保传播更改。

那么，哪一个？

我认为A和B是显而易见的选择，B更安全。但是C呢？

如果 C 没问题，是否有其他方法可以强制标志更改在所有 CPU 上可见？

有一个有点相关的问题：使用 pthread 互斥体保护变量是否保证它也不会被缓存？ ...但它并没有真正回答这个问题。

原文

Let's imagine that I have a few worker threads such as follows:

while (1) {
    do_something();

    if (flag_isset())
        do_something_else();
}

We have a couple of helper functions for checking and setting a flag:

void flag_set()   { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int  flag_isset() { return global_flag; }

Thus the threads keep calling do_something() in a busy-loop and in case some other thread sets global_flag the thread also calls do_something_else() (which could for example output progress or debugging information when requested by setting the flag from another thread).

My question is: Do I need to do something special to synchronize access to the global_flag? If yes, what exactly is the minimum work to do the synchronization in a portable way?

I have tried to figure this out by reading many articles but I am still not quite sure of the correct answer... I think it is one of the following:

A: No need to synchronize because setting or clearing the flag does not create race conditions:

We just need to define the flag as volatile to make sure that it is really read from the shared memory every time it is being checked:

volatile int global_flag;

It might not propagate to other CPU cores immediately but will sooner or later, guaranteed.

B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:

Setting the shared flag in one CPU core does not necessarily make it seen by another core. We need to use a mutex to make sure that flag changes are always propagated by invalidating the corresponding cache lines on other CPUs. The code becomes as follows:

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset()
{
    int rc;
    pthread_mutex_lock(flag_mutex);
    rc = global_flag;
    pthread_mutex_unlock(flag_mutex);
    return rc;
}

C: Synchronization is needed to make sure that changes to the flag are propagated between threads:

This is the same as B but instead of using a mutex on both sides (reader & writer) we set it in only in the writing side. Because the logic does not require synchronization. we just need to synchronize (invalidate other caches) when the flag is changed:

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset() { return global_flag; }

This would avoid continuously locking and unlocking the mutex when we know that the flag is rarely changed. We are just using a side-effect of Pthreads mutexes to make sure that the change is propagated.

So, which one?

I think A and B are the obvious choices, B being safer. But how about C?

If C is ok, is there some other way of forcing the flag change to be visible on all CPUs?

There is one somewhat related question: Does guarding a variable with a pthread mutex guarantee it's also not cached? ...but it does not really answer this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

此岸叶落 2024-12-08 23:15:05

“最小工作量”是一个明确的记忆障碍。语法取决于您的编译器；在 GCC 上你可以这样做：

void flag_set()   {
  global_flag = 1;
  __sync_synchronize(global_flag);
}

void flag_clear() {
  global_flag = 0;
  __sync_synchronize(global_flag);
}

int  flag_isset() {
  int val;
  // Prevent the read from migrating backwards
  __sync_synchronize(global_flag);
  val = global_flag;
  // and prevent it from being propagated forwards as well
  __sync_synchronize(global_flag);
  return val;
}

这些内存屏障实现了两个重要目标：

它们强制编译器刷新。考虑如下循环：

 for (int i = 0; i < 1000000000; i++) {
   flag_set(); // 假设这是内联的
   local_counter += i；
 }

如果没有障碍，编译器可能会选择将其优化为：

 for (int i = 0; i < 1000000000; i++) {
   local_counter += i；
 }
 flag_set();

插入屏障会强制编译器立即将变量写回。

它们强制 CPU 排序其写入和读取。这并不是单个标志的问题 - 大多数 CPU 架构最终都会看到一个没有 CPU 级障碍设置的标志。但顺序可能会改变。如果我们有两个标志，并且在线程 A 上：
```
 // 仅以标志 A 设置开始
  flag_set_B();
  flag_clear_A();
```
在线程 B 上：
```
 a = flag_isset_A();
  b = flag_isset_B();
  断言（a || b）； // 可能为假！
```
某些 CPU 架构允许对这些写入进行重新排序；您可能会看到两个标志都为假（即，标志 A write 首先被移动）。如果标志保护（例如，指针有效），这可能会成为问题。内存屏障强制对写入进行排序以防止出现这些问题。

另请注意，在某些 CPU 上，可以使用“获取-释放”屏障语义来进一步减少开销。然而，x86 上不存在这种区别，并且在 GCC 上需要内联汇编。

关于什么是内存屏障以及为什么需要它们的详细概述可以在 Linux 内核文档目录。最后，请注意，此代码对于单个标志来说足够了，但如果您还想与任何其他值同步，则必须非常小心。锁通常是最简单的做事方式。

The 'minimum amount of work' is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:

void flag_set()   {
  global_flag = 1;
  __sync_synchronize(global_flag);
}

void flag_clear() {
  global_flag = 0;
  __sync_synchronize(global_flag);
}

int  flag_isset() {
  int val;
  // Prevent the read from migrating backwards
  __sync_synchronize(global_flag);
  val = global_flag;
  // and prevent it from being propagated forwards as well
  __sync_synchronize(global_flag);
  return val;
}

These memory barriers accomplish two important goals:

They force a compiler flush. Consider a loop like the following:
```
 for (int i = 0; i < 1000000000; i++) {
   flag_set(); // assume this is inlined
   local_counter += i;
 }
```
Without a barrier, a compiler might choose to optimize this to:
```
 for (int i = 0; i < 1000000000; i++) {
   local_counter += i;
 }
 flag_set();
```
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag - most CPU architectures will eventually see a flag that's set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
```
  // start with only flag A set
  flag_set_B();
  flag_clear_A();
```
And on thread B:
```
  a = flag_isset_A();
  b = flag_isset_B();
  assert(a || b); // can be false!
```
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.

Note also that on some CPUs, it's possible to use 'acquire-release' barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.

A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.

回复收藏 0 原文

蒲公英的约定 2024-12-08 23:15:05

您不得引起数据争用情况。这是未定义的行为，编译器可以做任何它想做的事情。

关于该主题的幽默博客：http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possously-go-wrong

案例 1：标志上没有同步，所以任何事情都可以发生。比如允许编译器变成

flag_set();
while(weArentBoredLoopingYet())
    doSomethingVeryExpensive();
flag_clear()

Note

while(weArentBoredLoopingYet())
    doSomethingVeryExpensive();
flag_set();
flag_clear()

:这种race其实很流行。您的里程数可能会有所不同。一方面，pthread_call_once 的实际实现涉及这样的数据竞争。另一方面，它是未定义的行为。在大多数版本的 gcc 上，您可以逃脱它，因为 gcc 在许多情况下选择不行使以这种方式优化的权利，但它不是“规范”代码。

B：完全同步是正确的选择。这就是您必须做的。

C：只有在写入器上进行同步才能起作用，如果你能证明没有人想在写入时读取它。数据竞争的官方定义（来自 C++11 规范）是一个线程写入变量，而另一个线程可以同时读取或写入同一变量。如果你的读者和作者都同时跑步，你仍然会遇到竞争情况。但是，如果你可以证明写入者写入一次，存在一些同步，然后读取者都读取，那么读取者不需要同步。

至于缓存，规则是互斥体锁定/解锁与锁定/解锁同一互斥体的所有线程同步。这意味着您不会看到任何不寻常的缓存效果（尽管在幕后，您的处理器可以做一些惊人的事情来使其运行得更快......它只是有义务让它看起来没有做任何特别的事情）。但是，如果您不同步，则无法保证其他线程没有推送您需要的更改！

话虽如此，问题实际上是您愿意在多大程度上依赖编译器的特定行为。如果你想编写正确的代码，你需要进行正确的同步。如果您愿意依赖编译器对您友善，那么您可以少做很多事情。

如果您使用 C++11，简单的答案是使用atomic_flag，它旨在完全执行您想要的操作，并且在大多数情况下为您正确同步。

You must not cause data race cases. It is undefined behavior and the compiler is allowed to do anything and everything it pleases.

A humorous blog on the topic: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

Case 1: There is no synchronization on the flag, so anything is allowed to happen. For example, the compiler is allowed to turn

flag_set();
while(weArentBoredLoopingYet())
    doSomethingVeryExpensive();
flag_clear()

into

while(weArentBoredLoopingYet())
    doSomethingVeryExpensive();
flag_set();
flag_clear()

Note: this kind of race is actually very popular. Your millage may vary. One one hand, the de-facto implementation of pthread_call_once involves a data race like this. On the other hand, it is undefined behavior. On most versions of gcc, you can get away with it because gcc chooses not to exercise its right to optimize this way in many cases, but it is not "spec" code.

B: full synchronization is the right call. This is simply what you have to do.

C: Only synchronization on the writer could work, if you can prove that no one wants to read it while it is writing. The official definition of a data race (from the C++11 specification) is one thread writing to a variable while another thread can concurrently read or write the same variable. If your readers and writers all run at once, you still have a race case. However, if you can prove that the writer writes once, there is some synchronization, and then the readers all read, then the readers do not need synchronization.

As for caching, the rule is that a mutex lock/unlock synchronizes with all threads that lock/unlock the same mutex. This means you will not see any unusual caching effects (although under the hood, your processor can do spectacular things to make this run faster... it's just obliged to make it look like it wasn't doing anything special). If you don't synchronize, however, you get no guarantees that the other thread doesn't have changes to push that you need!

All of that being said, the question is really how much are you willing to rely on compiler specific behavior. If you want to write proper code, you need to do proper synchronization. If you are willing to rely on the compiler to be kind to you, you can get away with a lot less.

If you have C++11, the easy answer is to use atomic_flag, which is designed to do exactly what you want AND is designed to synchronize correctly for you in most cases.

回复收藏 0 原文

苏辞 2024-12-08 23:15:05

对于您发布的示例，情况 A 就足够了，只要...

获取和设置标志仅需要一条 CPU 指令。
do_something_else() 不依赖于在该例程执行期间设置的标志。

如果获取和/或设置标志需要多个 CPU 指令，则必须采用某种形式的锁定。

如果 do_something_else() 取决于在该例程执行期间设置的标志，则必须像情况 C 一样锁定，但必须在调用 flag_isset() 之前锁定互斥体。

希望这有帮助。

回复收藏 0 原文

我一向站在原地 2024-12-08 23:15:05

将传入作业分配给工作线程不需要锁定。典型的例子是网络服务器，其中请求由主线程捕获，并且该主线程选择一个工作线程。我正在尝试用一些伪代码来解释它。

main task {

  // do forever
  while (true)

    // wait for job
    while (x != null) {
      sleep(some);
      x = grabTheJob(); 
    }

    // select worker
    bool found = false;
    for (n = 0; n < NUM_OF_WORKERS; n++)
     if (workerList[n].getFlag() != AVAILABLE) continue;
     workerList[n].setJob(x);
     workerList[n].setFlag(DO_IT_PLS);
     found = true;
    }

    if (!found) panic("no free worker task! ouch!");

  } // while forever
} // main task


worker task {

  while (true) {
    while (getFlag() != DO_IT_PLS) sleep(some);
    setFlag(BUSY_DOING_THE_TASK);

    /// do it really

    setFlag(AVAILABLE);

  } // while forever 
} // worker task

因此，如果有一个标志，一方将其设置为 A，另一方设置为 B 和 C（主任务将其设置为 DO_IT_PLS，工作人员将其设置为 BUSY 和 AVAILABLE），则不存在冲突。玩“现实生活”的例子，例如，当老师给学生分配不同的任务时。老师选择一个学生，给他/她一项任务。然后，老师寻找下一个有空位的学生。当学生准备好时，他/她会回到可用学生池中。

更新：澄清一下，只有一个 main() 线程和几个可配置数量的工作线程。由于 main() 仅运行一个实例，因此无需同步工作线程的选择和启动。

Assigning incoming job to worker threads requires no locking. Typical example is webserver, where the request is catched by a main thread, and this main thread selects a worker. I'm trying explain it with some pesudo code.

main task {

  // do forever
  while (true)

    // wait for job
    while (x != null) {
      sleep(some);
      x = grabTheJob(); 
    }

    // select worker
    bool found = false;
    for (n = 0; n < NUM_OF_WORKERS; n++)
     if (workerList[n].getFlag() != AVAILABLE) continue;
     workerList[n].setJob(x);
     workerList[n].setFlag(DO_IT_PLS);
     found = true;
    }

    if (!found) panic("no free worker task! ouch!");

  } // while forever
} // main task


worker task {

  while (true) {
    while (getFlag() != DO_IT_PLS) sleep(some);
    setFlag(BUSY_DOING_THE_TASK);

    /// do it really

    setFlag(AVAILABLE);

  } // while forever 
} // worker task

So, if there are one flag, which one party sets is to A and another to B and C (the main task sets it to DO_IT_PLS, and the worker sets it to BUSY and AVAILABLE), there is no confilct. Play it with "real-life" example, say, when the teacher is giving different tasks to students. The teacher selects a student, gives him/her a task. Then, the teacher looks for next available student. When a student is ready, he/she gets back to the pool of available students.

UPDATE: just clarify, there are only one main() thread and several - configurable number of - worker threads. As main() runs only one instance, there is no need to sync the selection and launc of the workers.

回复收藏 0 原文

~没有更多了~