假设我们有一个单生产者线程单消费者线程无锁队列,并且生产者可能会长时间不产生任何数据。当队列中没有任何内容时让消费者线程休眠(以节省电量并为其他进程/线程释放 CPU)将是有益的。如果队列不是无锁的,解决此问题的直接方法是让生成线程锁定互斥锁,执行其工作,向条件变量发出信号并解锁,并让读取线程锁定互斥锁,等待条件变量,进行读取,然后解锁。但是,如果我们使用无锁队列,则以完全相同的方式使用互斥锁会消除我们从使用无锁队列中获得的性能。
简单的解决方案是让生产者在每次插入队列后锁定互斥锁,向条件变量发出信号,然后解锁互斥锁,将实际工作(插入队列)完全保持在锁之外,并让消费者执行同样,锁定互斥锁,等待条件变量,解锁它,将所有内容从队列中拉出,然后重复,将队列的读取保持在锁之外。不过,这里存在一个竞争条件:在读取器退出队列和进入睡眠之间,生产者可能已将一个项目插入到队列中。现在,读者将进入睡眠状态,并且可能无限期地保持这种状态,直到生产者插入另一个项目并再次发出条件变量信号。这意味着您有时可能会得到似乎需要很长时间才能通过队列的特定物品。如果您的队列始终处于活动状态,这可能不是问题,但如果它始终处于活动状态,您可能会完全忘记条件变量。
AFAICT 的解决方案是生产者的行为与使用常规需求锁定队列相同。它应该锁定互斥体,插入到无锁队列中,向条件变量发出信号,解锁。然而,消费者的行为应该有所不同。当它醒来时,它应该立即解锁互斥体,而不是等到它读取队列。然后它应该尽可能多地提取队列并进行处理。最后,只有当消费者想要睡觉时,才应该锁定互斥体,检查是否有任何数据,如果有则解锁并处理它,如果没有则等待条件变量。这样,互斥锁的争用频率比锁满队列的情况要少,但不存在因数据仍留在队列中而进入睡眠状态的风险。
这是最好的方法吗?还有其他选择吗?
注意:我所说的“最快”实际上是指“最快,无需专用核心一遍又一遍地检查队列”,但这不适合标题;p
另一种选择:采用简单的解决方案,但让消费者等待条件变量,超时时间与您愿意容忍的队列中的项目的最大延迟相对应。如果所需的超时时间相当短,则它可能低于操作系统的最短等待时间,或者仍然消耗过多的 CPU。
Say we have a single-producer-thread single-consumer-thread lockless queue, and that the producer may go long periods without producing any data. It would be beneficial to let the consumer thread sleep when there is nothing in the queue (for the power savings and freeing up the CPU for other processes/threads). If the queue were not lockless, the straightforward way to solve this problem is to have the producing thread lock a mutex, do its work, signal a condition variable and unlock, and for the reading thread to lock the mutex, wait on the condition variable, do its reading, then unlock. But if we're using a lockless queue, using a mutex the exact same way would eliminate the performance we gain from using a lockless queue in the first place.
The naive solution is to have the producer after each insertion into the queue lock the mutex, signal the condition variable, then unlock the mutex, keeping the actual work (the insertion into the queue) completely outside the lock, and to have the consumer do the same, locking the mutex, waiting on the condition variable, unlocking it, pulling everything off the queue, then repeat, keeping the reading of the queue outside the lock. There's a race condition here though: between the reader pulling off the queue and going to sleep, the producer may have inserted an item into the queue. Now the reader will go to sleep, and may stay so indefinitely until the producer inserts another item and signals the condition variable again. This means you can occasionally end up with particular items seeming to take a very long time to travel through the queue. If your queue is always constantly active this may not be a problem, but if it were always active you could probably forget the condition variable entirely.
AFAICT the solution is for the producer to behave the same as if it were working with a regular needs-locking queue. It should lock the mutex, insert into the lockless queue, signal the condition variable, unlock. However, the consumer should behave differently. When it wakes, it should unlock the mutex immediately instead of waiting until it's read the queue. Then it should pull as much of the queue as it can and process it. Finally, only when the consumer is thinking of going to sleep, should it lock the mutex, check if there's any data, then if so unlock and process it or if not then wait on the condition variable. This way the mutex is contended less often than it would be with a lockfull queue, but there's no risk of going to sleep with data still left on the queue.
Is this the best way to do it? Are there alternatives?
Note: By 'fastest' I really mean 'fastest without dedicating a core to checking the queue over and over,' but that wouldn't fit in the title ;p
One alternative: Go with the naive solution, but have the consumer wait on the condition variable with a timeout corresponding to the maximum latency you are willing to tolerate for an item traveling through the queue. If the desired timeout is fairly short though, it may be below the minimum wait time for your OS or still consume too much CPU.
发布评论
评论(1)
我假设您正在使用 中的无锁单生产者单消费者队列多布斯博士的文章 - 或类似的文章 - 所以我将使用那里的术语。
在这种情况下,您在以“AFAICT”开头的段落中建议的答案很好,但我认为它可以稍微优化:
last
的副本,并将其命名为saved_last
new_item
,然后获取divider
指针的副本,将其命名为saved_divider
。saved_divider
的值等于您刚刚插入的对象new_item
,则您的对象已被使用,您的工作已完成。saved_divider
的值不等于saved_last
,则无需唤醒消费者。这是因为:divider
尚未到达new_item
或saved_last
last
仅具有这两个值divider
等于last
时,消费者才会停止这可以确保,在消费者倾向于保持忙碌的情况下,您可以避免当您知道消费者仍然醒着(并且不打算睡觉)时锁定互斥体。它还最大限度地减少了互斥锁的持有时间,以进一步减少争用的可能性。
上面的解释相当冗长(因为我想解释它为什么起作用,而不仅仅是算法是什么),但由此产生的代码应该非常简单。
当然,这是否真的值得做取决于很多因素,我鼓励您衡量性能对您是否至关重要。大多数情况下,mutex/condvar 原语的良好实现在内部使用原子操作,因此它们通常只在需要时才进行内核调用(最昂贵的位!),即如果需要阻塞,或者肯定有一些线程等待 - 因此,不调用互斥函数所节省的时间可能仅相当于库调用的开销。
I'm assuming you're using the lock-free single-producer single-consumer queue from the Dr Dobbs article - or something similar - so I'll use the terminology from there.
In that case, your suggested answer in the paragraph that starts "AFAICT" is good, but I think it can be optimised slightly:
last
, call itsaved_last
new_item
as usual, then take a copy of thedivider
pointer, call itsaved_divider
.saved_divider
is equal tonew_item
, the object you just inserted, then your object has already been consumed, and your work is done.saved_divider
is not equal tosaved_last
, then you don't need to wake up the consumer. This is because:divider
had not yet reached eithernew_item
orsaved_last
last
has only had those two valuesdivider
is equal tolast
This ensures that, in the case where the consumer tends to remain busy, you avoid locking the mutex when you know the consumer is still awake (and not about to sleep). It also minimises the time when the mutex is held, to further reduce the possibility for contention.
The above explanation is quite wordy (because I wanted to include the explanation of why it works, rather than just what the algorithm is), but the code resulting from it should be quite simple.
Of course whether it's actually worth doing will depend on a lot of things, and I'd encourage you to measure if performance is critical for you. Good implementations of the mutex/condvar primitives internally use atomic operations for most cases, so they generally only make a kernel call (the most expensive bit!) if there's a need to - i.e. if there's a need to block, or there are definitely some threads waiting - so the time saved by not calling the mutex functions may only amount to the overhead of the library calls.