最大化工作线程利用率
为了解决问题(以及更好地理解多任务),我编写了一个小型线程池实现。该线程池启动许多工作线程,当线程池的客户端添加任务时,这些线程将任务从队列中弹出。出于此问题的目的,当任务队列为空时,工作线程将全部终止。
在进行一些基本基准测试后,我发现应用程序花费了约 60% 的时间来等待获取队列锁。据推测,这主要发生在工作线程内。
这是否仅仅表明我没有给工作线程足够的时间来做,或者还有更多的事情?我可能缺少一些简单的东西来增加工作线程吞吐量吗?
编辑:这是一些粗略的伪代码,可以在一定程度上说明问题。这是在工作线程执行期间(占应用程序运行时间的绝大多数时间)获取/释放锁的唯一两个位置。
std::list<task_t> task_list;
// Called by the client to add tasks to the thread pool
void insert_task(const task_t& task)
{
lock_type listlock(task_mutex);
task_list.push_back(task);
}
// The base routine of each thread in the pool. Some details
// such as lifetime management have been omitted for clarity.
void worker_thread_base()
{
while (true)
{
task_t task;
{
lock_type listlock(task_mutex);
if (task_list.empty())
continue;
task = task_list.front();
task_list.pop_front();
}
do_task(task);
}
}
To solve a problem (and better my understanding of multitasking) I have written a small thread pool implementation. This thread pool spins up a number of worker threads which pop tasks off of a queue as they are added by the client of the thread pool. For the purposes of this question when the task queue is empty the worker threads are all terminated.
After doing some basic benchmarking I have discovered the application spends ~60% of its time waiting to acquire the queue lock. Presumably this is mostly taking place within the worker threads.
Is this merely an indication I'm not giving the worker threads enough to do, or something more? Is there something straightforward I may be missing to increase worker thread throughput?
EDIT: Here is some rough pseudocode that should illustrate things somewhat. These are the only two places where a lock is acquired/released during the execution of the worker threads (which is a vast majority of the running time of the application.)
std::list<task_t> task_list;
// Called by the client to add tasks to the thread pool
void insert_task(const task_t& task)
{
lock_type listlock(task_mutex);
task_list.push_back(task);
}
// The base routine of each thread in the pool. Some details
// such as lifetime management have been omitted for clarity.
void worker_thread_base()
{
while (true)
{
task_t task;
{
lock_type listlock(task_mutex);
if (task_list.empty())
continue;
task = task_list.front();
task_list.pop_front();
}
do_task(task);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的设计是在每个线程所在的位置构建的,并“旋转”尝试获取锁。除非每个工作线程都在执行工作,否则这种情况会不断发生 - 在这种情况下,锁将未被获取并且工作将会发生。
由于所有线程都坐在锁上旋转,您将使用相当多的 CPU 时间等待。考虑到您的设计,这在某种程度上是预料之中的。
您会发现,如果您的工作线程较少,则阻塞时间的百分比可能会急剧减少 - 并且当您的工作项多于线程时,您将花费很少的时间等待该锁。
更好的设计是为工作队列使用某种形式的无锁队列,因为这可以防止此时等待。此外,拥有一个等待句柄可以阻塞工作线程,直到队列中有工作为止,这将防止不必要的旋转。
Your design is built where each thread sits and "spins" trying to acquire the lock. This will happen constantly unless every worker thread is performing work - in which case the lock will sit unacquired and the work will occur.
With all of your threads just sitting, spinning on a lock, you're going to use quite a bit of CPU time waiting. This is somewhat expected, given your design.
You'll find that the percentage of time blocked will likely shrink dramatically if you have fewer worker threads - and at the point where you have more work items than threads, you'll spend very little time waiting on that lock.
A much better design would be to use some form of lockless queue for your work queue, as this could prevent waiting at this point. In addition, having a wait handle that could block the worker threads until there is work in the queue will prevent the unnecessary spinning.
您是否尝试使用单个锁或多个锁来执行此操作?互斥体?您使用什么等待语义?
我从你的描述中猜测(这纯粹是猜测),你有类似的东西:
在你的主线程中,其中包含分派到轻量级线程的代码。您可能会看到等待时间增加的原因之一是因为您需要从旋转的线程返回信号,表明它们已排队并正在等待执行(这又是一种猜测,因为您没有提供任何代码)。
解决这个问题的一种方法是从使用显式锁(如上所述)切换到使用有信号互斥体,当您希望其中一个线程获取工作时,该互斥体会发出脉冲。
不过,如果没有看到您当前的实现,我不确定我是否可以对此做更多的事情。
Are you trying to do this with a single lock, multiple locks? Mutexs? What wait semantics are you using?
I would guess from your description (and this is purely a guess) that you have something similar to:
In your main thread which contains the code to dispatch to the lightweight threads. One reason why you might see aggrevated wait times on this is because you need to have signals back from the spun up threads that they have been queued and are waiting for execution (again this is a guess since you didnt give any code).
One way you might solve this is to switch from using an explicit lock, as above, into using a signaled mutex which is pulsed when you want one of the threads to grab work.
Without seeing your current implementation though, I am not sure I can over much more over that.