轮询无锁队列最快的无竞争方法是什么？

发布于 2024-10-03 22:14:42 字数 1002 浏览 5 评论 0 原文

假设我们有一个单生产者线程单消费者线程无锁队列，并且生产者可能会长时间不产生任何数据。当队列中没有任何内容时让消费者线程休眠（以节省电量并为其他进程/线程释放 CPU）将是有益的。如果队列不是无锁的，解决此问题的直接方法是让生成线程锁定互斥锁，执行其工作，向条件变量发出信号并解锁，并让读取线程锁定互斥锁，等待条件变量，进行读取，然后解锁。但是，如果我们使用无锁队列，则以完全相同的方式使用互斥锁会消除我们从使用无锁队列中获得的性能。

简单的解决方案是让生产者在每次插入队列后锁定互斥锁，向条件变量发出信号，然后解锁互斥锁，将实际工作（插入队列）完全保持在锁之外，并让消费者执行同样，锁定互斥锁，等待条件变量，解锁它，将所有内容从队列中拉出，然后重复，将队列的读取保持在锁之外。不过，这里存在一个竞争条件：在读取器退出队列和进入睡眠之间，生产者可能已将一个项目插入到队列中。现在，读者将进入睡眠状态，并且可能无限期地保持这种状态，直到生产者插入另一个项目并再次发出条件变量信号。这意味着您有时可能会得到似乎需要很长时间才能通过队列的特定物品。如果您的队列始终处于活动状态，这可能不是问题，但如果它始终处于活动状态，您可能会完全忘记条件变量。

AFAICT 的解决方案是生产者的行为与使用常规需求锁定队列相同。它应该锁定互斥体，插入到无锁队列中，向条件变量发出信号，解锁。然而，消费者的行为应该有所不同。当它醒来时，它应该立即解锁互斥体，而不是等到它读取队列。然后它应该尽可能多地提取队列并进行处理。最后，只有当消费者想要睡觉时，才应该锁定互斥体，检查是否有任何数据，如果有则解锁并处理它，如果没有则等待条件变量。这样，互斥锁的争用频率比锁满队列的情况要少，但不存在因数据仍留在队列中而进入睡眠状态的风险。

这是最好的方法吗？还有其他选择吗？

注意：我所说的“最快”实际上是指“最快，无需专用核心一遍又一遍地检查队列”，但这不适合标题；p

另一种选择：采用简单的解决方案，但让消费者等待条件变量，超时时间与您愿意容忍的队列中的项目的最大延迟相对应。如果所需的超时时间相当短，则它可能低于操作系统的最短等待时间，或者仍然消耗过多的 CPU。

原文

Say we have a single-producer-thread single-consumer-thread lockless queue, and that the producer may go long periods without producing any data. It would be beneficial to let the consumer thread sleep when there is nothing in the queue (for the power savings and freeing up the CPU for other processes/threads). If the queue were not lockless, the straightforward way to solve this problem is to have the producing thread lock a mutex, do its work, signal a condition variable and unlock, and for the reading thread to lock the mutex, wait on the condition variable, do its reading, then unlock. But if we're using a lockless queue, using a mutex the exact same way would eliminate the performance we gain from using a lockless queue in the first place.

The naive solution is to have the producer after each insertion into the queue lock the mutex, signal the condition variable, then unlock the mutex, keeping the actual work (the insertion into the queue) completely outside the lock, and to have the consumer do the same, locking the mutex, waiting on the condition variable, unlocking it, pulling everything off the queue, then repeat, keeping the reading of the queue outside the lock. There's a race condition here though: between the reader pulling off the queue and going to sleep, the producer may have inserted an item into the queue. Now the reader will go to sleep, and may stay so indefinitely until the producer inserts another item and signals the condition variable again. This means you can occasionally end up with particular items seeming to take a very long time to travel through the queue. If your queue is always constantly active this may not be a problem, but if it were always active you could probably forget the condition variable entirely.

AFAICT the solution is for the producer to behave the same as if it were working with a regular needs-locking queue. It should lock the mutex, insert into the lockless queue, signal the condition variable, unlock. However, the consumer should behave differently. When it wakes, it should unlock the mutex immediately instead of waiting until it's read the queue. Then it should pull as much of the queue as it can and process it. Finally, only when the consumer is thinking of going to sleep, should it lock the mutex, check if there's any data, then if so unlock and process it or if not then wait on the condition variable. This way the mutex is contended less often than it would be with a lockfull queue, but there's no risk of going to sleep with data still left on the queue.

Is this the best way to do it? Are there alternatives?

Note: By 'fastest' I really mean 'fastest without dedicating a core to checking the queue over and over,' but that wouldn't fit in the title ;p

One alternative: Go with the naive solution, but have the consumer wait on the condition variable with a timeout corresponding to the maximum latency you are willing to tolerate for an item traveling through the queue. If the desired timeout is fairly short though, it may be below the minimum wait time for your OS or still consume too much CPU.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

菊凝晚露 2024-10-10 22:14:42

我假设您正在使用中的无锁单生产者单消费者队列多布斯博士的文章 - 或类似的文章 - 所以我将使用那里的术语。

在这种情况下，您在以“AFAICT”开头的段落中建议的答案很好，但我认为它可以稍微优化：

在消费者中 - 正如您所说，当消费者耗尽队列并正在考虑睡觉时（只有那时），它锁定互斥体，再次检查队列，然后
- 如果队列中有新项目，则释放互斥体并继续工作
- 或阻塞条件变量（当互斥锁醒来发现非空队列时，自然会释放互斥锁）。
在生产者中：
- 首先获取 last 的副本，并将其命名为 saved_last
- 照常添加项目 new_item，然后获取 divider 指针的副本，将其命名为 saved_divider。
- 如果 saved_divider 的值等于您刚刚插入的对象 new_item，则您的对象已被使用，您的工作已完成。
- 否则，如果 saved_divider 的值不等于 saved_last，则无需唤醒消费者。这是因为：
  - 在严格添加新对象后的某个时间，您知道 divider 尚未到达 new_item 或 saved_last
  - 自从您开始插入以来，last 仅具有这两个值
  - 只有当 divider 等于 last 时，消费者才会停止
  - 因此，消费者必须仍然清醒，并且会在睡觉前拿到您的新商品。
- 否则锁定互斥锁，向 condvar 发出信号，然后释放互斥锁。（在这里获取互斥锁可以确保您在消费者注意到队列为空和实际阻塞 condvar 之间的时间内不会向 condar 发出信号。）

这可以确保，在消费者倾向于保持忙碌的情况下，您可以避免当您知道消费者仍然醒着（并且不打算睡觉）时锁定互斥体。它还最大限度地减少了互斥锁的持有时间，以进一步减少争用的可能性。

上面的解释相当冗长（因为我想解释它为什么起作用，而不仅仅是算法是什么），但由此产生的代码应该非常简单。

当然，这是否真的值得做取决于很多因素，我鼓励您衡量性能对您是否至关重要。大多数情况下，mutex/condvar 原语的良好实现在内部使用原子操作，因此它们通常只在需要时才进行内核调用（最昂贵的位！），即如果需要阻塞，或者肯定有一些线程等待 - 因此，不调用互斥函数所节省的时间可能仅相当于库调用的开销。

I'm assuming you're using the lock-free single-producer single-consumer queue from the Dr Dobbs article - or something similar - so I'll use the terminology from there.

In that case, your suggested answer in the paragraph that starts "AFAICT" is good, but I think it can be optimised slightly:

In the consumer - as you say, when the consumer has exhausted the queue and is considering sleeping (and only then), it locks the mutex, checks the queue again, and then either
- releases the mutex and carries on working, if there was a new item in the queue
- or blocks on the condition variable (releasing the mutex when it awakes to find a non-empty queue, naturally).
In the producer:
- First take a copy of last, call it saved_last
- Add the item new_item as usual, then take a copy of the divider pointer, call it saved_divider.
- If the value of saved_divider is equal to new_item, the object you just inserted, then your object has already been consumed, and your work is done.
- Otherwise, if the value of saved_divider is not equal to saved_last, then you don't need to wake up the consumer. This is because:
  - At a time strictly after you added your new object, you know that divider had not yet reached either new_item or saved_last
  - Since you started the insertion, last has only had those two values
  - The consumer only ever stops when divider is equal to last
  - Therefore the consumer must still be awake and will reach your new item before sleeping.
- Otherwise lock the mutex, signal the condvar then release the mutex. (Obtaining the mutex here ensures you don't signal the condar in the time between the consumer noticing the queue is empty, and actually blocking on the condvar.)

This ensures that, in the case where the consumer tends to remain busy, you avoid locking the mutex when you know the consumer is still awake (and not about to sleep). It also minimises the time when the mutex is held, to further reduce the possibility for contention.

The above explanation is quite wordy (because I wanted to include the explanation of why it works, rather than just what the algorithm is), but the code resulting from it should be quite simple.

Of course whether it's actually worth doing will depend on a lot of things, and I'd encourage you to measure if performance is critical for you. Good implementations of the mutex/condvar primitives internally use atomic operations for most cases, so they generally only make a kernel call (the most expensive bit!) if there's a need to - i.e. if there's a need to block, or there are definitely some threads waiting - so the time saved by not calling the mutex functions may only amount to the overhead of the library calls.

回复收藏 0 原文

~没有更多了~