多生产者/单消费者模型中的最佳睡眠时间
我正在编写一个具有多个生产者、单个消费者模型的应用程序(多个线程将消息发送到单个文件编写器线程)。
每个生产者线程包含两个队列,一个用于写入,另一个用于消费者读取。消费者线程的每个循环,它都会迭代每个生产者并锁定该生产者的互斥锁,交换队列,解锁并从生产者不再使用的队列中写出。
在消费者线程的循环中,它在处理完所有生产者线程后会休眠指定的时间。我立即注意到的一件事是,当我从 1 个生产者线程移动到 2 个生产者线程时,生产者向队列写入内容并返回的平均时间急剧增加(增加了 5 倍)。随着添加更多线程,该平均时间会减少直至触底out - 10 个生产者与 15 个生产者所花费的时间没有太大差异。这可能是因为要处理的生产者越多,对生产者线程互斥体的争用就越少。
不幸的是,有< 5 个生产者对于应用程序来说是相当常见的场景,我想优化睡眠时间,以便无论存在多少个生产者,我都能获得合理的性能。我注意到,通过增加睡眠时间,对于低生产者数量,我可以获得更好的性能,但对于大生产者数量,我可以获得更差的性能。
有其他人遇到过这种情况吗?如果是的话,您的解决方案是什么?我尝试过根据线程数量来调整睡眠时间,但它似乎有点特定于机器并且相当反复试验。
I'm writing an application that has a multiple producer, single consumer model (multiple threads send messages to a single file writer thread).
Each producer thread contains two queues, one to write into, and one for a consumer to read out of. Every loop of the consumer thread, it iterates through each producer and lock that producer's mutex, swaps the queues, unlocks, and writes out from the queue that the producer is no longer using.
In the consumer thread's loop, it sleeps for a designated amount of time after it processes all producer threads. One thing I immediately noticed was that the average time for a producer to write something into the queue and return increased dramatically (by 5x) when I moved from 1 producer thread to 2. As more threads are added, this average time decreases until it bottoms out - there isn't much difference between the time taken with 10 producers vs 15 producers. This is presumably because with more producers to process, there is less contention for the producer thread's mutex.
Unfortunately, having < 5 producers is a fairly common scenario for the application and I'd like to optimize the sleep time so that I get reasonable performance regardless of how many producers exist. I've noticed that by increasing the sleep time, I can get better performance for low producer counts, but worse performance for large producer counts.
Has anybody else encountered this, and if so what was your solution? I have tried scaling the sleep time with the number of threads, but it seems somewhat machine specific and pretty trial-and-error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以根据生产者的数量选择睡眠时间,甚至可以根据某些动态方案调整睡眠时间。如果消费者醒来后没有工作,则将睡眠时间加倍,否则减半。但将睡眠时间限制在某个最小值和最大值。
无论哪种方式,你都掩盖了一个更根本的问题。睡眠和轮询很容易做到正确,有时是唯一可用的方法,但它有很多缺点,而且不是“正确”的方法。
您可以通过添加一个信号量来朝着正确的方向前进,每当生产者将项目添加到队列中时,信号量就会递增,而当消费者处理队列中的项目时,信号量就会递减。只有当有物品需要处理时,消费者才会醒来,并且会立即这样做。
不过,轮询队列可能仍然是一个问题。您可以添加一个新队列来引用其中包含项目的任何队列。但这提出了一个问题,即为什么没有消费者处理的单个队列,而不是每个生产者处理的队列。在其他条件相同的情况下,这听起来是最好的方法。
You could pick the sleep time based on the number of producers or even make the sleep time adapt based on some dyanmic scheme. If the consumer wakes up and has no work, double the sleep time, otherwise halve it. But constrain the sleep time to some minimum and maximum.
Either way you're papering over a more fundamental issue. Sleeping and polling is easy to get right and sometimes is the only approach available, but it has many drawbacks and isn't the "right" way.
You can head in the right direction by adding a semaphore which is incremented whenever a producer adds an item to a queue and decremented when the consumer processes an item in a queue. The consumer will only wake up when there are items to process and will do so immediately.
Polling the queues may still be a problem, though. You could add a new queue that refers to any queue which has items on it. But it rather raises the question as to why you don't have a single queue that the consumer processes rather than a queue per producer. All else being equal that sounds like the best approach.
我建议您的消费者根据生产者发出的信号阻止,而不是睡觉。在兼容 posix 的系统上,您可以使其与 pthread_cond 一起使用。创建一个 pthread_cond_t 数组,每个生产者一个,然后再创建一个在它们之间共享的数组。生产者首先发出各自的条件变量信号,然后发出共享的条件变量信号。使用者等待共享条件,然后迭代数组的元素,对数组的每个元素执行
pthread_cond_timed_wait()
(使用pthread_get_expiration_np()
获取“现在”的绝对时间)。如果等待返回 0,则该生产者已写入数据。消费者在再次等待之前必须重新初始化条件变量。通过使用阻塞等待,您可以最大限度地减少消费者不必要地锁定生产者的时间。您还可以使用信号量来完成这项工作,如前面的答案中所述。在我看来,与条件相比,信号量具有简化的语义,但是您必须小心地为每次通过消费者循环时处理的每个生产者减少一次共享信号量。条件变量的优点是,如果您在收到信号后重新初始化它们,则基本上可以像布尔信号量一样使用它们。
Instead of sleeping, I would recommend that your consumer block on a condition signaled by the producers. On a posix-compliant system, you could make it work with pthread_cond. Create an array of
pthread_cond_t
, one for each producer, then create an additional one that is shared between them. The producers first signal their individual condition variable, and then the shared one. The consumer waits on the shared condition and then iterates over the elements of the array, performing apthread_cond_timed_wait()
on each element of the array (usepthread_get_expiration_np()
to get the absolute time for "now"). If the wait returns 0, then that producer has written data. The consumer must reinitialize the condition variables before waiting again.By using blocking waits, you'll minimize the amount time the consumer is needlessly locking-out the producers. You could also make this work with semaphores, as stated in a previous answer. Semaphores have simplified semantics compared to conditions, in my opinion, but you'd have to be careful to decrement the shared semaphore once for each producer that was processed on each pass through the consumer loop. Condition variables have the advantage that you can basically use them like boolean semaphores if you reinitialize them after they are signaled.
尝试用您用于编程的语言找到阻塞队列的实现。对于任意数量的生产者和一个消费者来说,不超过一个队列就足够了。
Try to find an implementation of a Blocking Queue in the language that you use for programming. No more than one queue will be enough for any number of producers and one consumer.
对我来说,这听起来像是您通过让消费者线程在其他地方忙碌(睡觉或做实际工作)而意外地引入了一些缓冲。 (队列充当缓冲区)也许在生产者端做一些简单的缓冲会减少您的争用。
看来您的系统对生产者和消费者之间的锁争用高度敏感,但我很困惑为什么这样一个简单的交换操作会占用足够的 cpu 时间来显示在您的运行统计信息中。
你能展示一些代码吗?
编辑:也许即使没有工作要做,您也会锁定并交换队列?
To me it sounds like you are accidentally introducing some buffering by having the consumer thread be busy somewhere else, either sleeping or doing actual work. (the queue acting as the buffer) Maybe doing some simple buffering on the producer side will reduce your contention.
It seems that your system is highly sensitive to lock-contention between the producer and consumer, but I'm baffled as to why such a simple swap operation would occupy enough cpu time to show up in your run stats.
Can you show some code?
edit: maybe you are taking your lock and swapping queues even when there is no work to do?