使用 pthread 来利用多核时出现问题
我正在使用 SDL 和 Pthread 用 C++ 开发光线追踪器。 我在让我的程序使用两个核心时遇到问题。 线程可以工作,但它们并没有 100% 使用两个核心。 为了连接 SDL,我直接写入它的内存 SDL_Surface.pixels,因此我认为 SDL 不会锁定我。
我的线程函数如下所示:
void* renderLines(void* pArg){
while(true){
//Synchronize
pthread_mutex_lock(&frame_mutex);
pthread_cond_wait(&frame_cond, &frame_mutex);
pthread_mutex_unlock(&frame_mutex);
renderLinesArgs* arg = (renderLinesArgs*)pArg;
for(int y = arg->y1; y < arg->y2; y++){
for(int x = 0; x < arg->width; x++){
Color C = arg->scene->renderPixel(x, y);
putPixel(arg->screen, x, y, C);
}
}
sem_post(&frame_rendered);
}
}
注意: scene->renderPixel 是 const,因此我假设两个线程都可以从同一内存中读取。 我有两个工作线程执行此操作,在我的主循环中,我使用以下方法进行这些工作:
//Signal a new frame
pthread_mutex_lock(&frame_mutex);
pthread_cond_broadcast(&frame_cond);
pthread_mutex_unlock(&frame_mutex);
//Wait for workers to be done
sem_wait(&frame_rendered);
sem_wait(&frame_rendered);
//Unlock SDL surface and flip it...
注意:我还尝试创建和加入线程而不是同步它们。 我用“-lpthread -D_POSIX_PTHREAD_SEMANTICS -pthread”编译它,并且 gcc 不会抱怨。
使用执行期间的 CPU 使用率图表可以最好地说明我的问题:
(来源:jopsen.dk)
从我的程序一次只使用一个核心,然后每隔一段时间在两个核心之间切换,但它不会将两个核心都驱动到 100%。 我到底做错了什么? 我没有在场景中使用任何互斥体或信号量。 我该怎么做才能找到错误?
另外,如果我将 while(true) 放在 scene->renderPixel() 周围,我可以将两个核心都推至 100%。 所以我怀疑这是由开销引起的,但考虑到复杂的场景,我只每 0.5 秒同步一次(例如 FPS:0.5)。 我意识到告诉我我的错误是什么可能并不容易,但是调试它的方法也很棒...我以前没有玩过pthreads...
另外,这可能是硬件或内核问题,我的内核是:
$uname -a
Linux jopsen-laptop 2.6.27-14-generic #1 SMP Fri Mar 13 18:00:20 UTC 2009 i686 GNU/Linux
注意:
I’m developing a Ray Tracer in C++ using SDL and Pthread. I’m having issues making my program utilize two cores. The threads work, but they don’t use both cores to 100%. To interface SDL I write directly to it's memory, SDL_Surface.pixels, so I assume that it can't be SDL locking me.
My thread function looks like this:
void* renderLines(void* pArg){
while(true){
//Synchronize
pthread_mutex_lock(&frame_mutex);
pthread_cond_wait(&frame_cond, &frame_mutex);
pthread_mutex_unlock(&frame_mutex);
renderLinesArgs* arg = (renderLinesArgs*)pArg;
for(int y = arg->y1; y < arg->y2; y++){
for(int x = 0; x < arg->width; x++){
Color C = arg->scene->renderPixel(x, y);
putPixel(arg->screen, x, y, C);
}
}
sem_post(&frame_rendered);
}
}
Note: scene->renderPixel is const, so I assume both threads can read from the same memory.
I have two worker threads doing this, in my main loop I make these work using:
//Signal a new frame
pthread_mutex_lock(&frame_mutex);
pthread_cond_broadcast(&frame_cond);
pthread_mutex_unlock(&frame_mutex);
//Wait for workers to be done
sem_wait(&frame_rendered);
sem_wait(&frame_rendered);
//Unlock SDL surface and flip it...
Note: I've also tried creating and joining the threads instead of synchronizing them.
I compile this with "-lpthread -D_POSIX_PTHREAD_SEMANTICS -pthread" and gcc does not complain.
My problem is best illustrated using a graph of the CPU usage during execution:
(source: jopsen.dk)
As can be seen from the graph my program only uses one core at a time, then switching between the two every once in a while, but it doesn't drive both to 100% ever.
What in the world have I done wrong? I'm not using any mutex or semaphors in scene.
What can I do to find the bug?
Also if I put while(true) around scene->renderPixel() I can push both cores to 100%. So I've suspected that this is caused by overhead, but I only synchronize every 0.5 second (e.g. FPS: 0.5), given a complex scene.
I realize it might not be easy to tell me what my bug is, but an approach to debugging this would be great too... I haven't played with pthreads before...
Also, can this be a hardware or kernel issue, my kernel is:
$uname -a
Linux jopsen-laptop 2.6.27-14-generic #1 SMP Fri Mar 13 18:00:20 UTC 2009 i686 GNU/Linux
Note:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是没用的:
如果您等待新帧,请执行以下操作:
int new_frame = 0;
第一个线程:
其他线程:
pthread_cond_wait(),实际上释放互斥锁,并取消调度线程,直到发出条件信号。 当条件发出信号时,线程被唤醒并重新获取互斥体。 所有这些都发生在 pthread_cond_wait() 函数内
This is useless :
if you wait to wait for a new frame do something like :
int new_frame = 0;
First thread :
other thread :
pthread_cond_wait(), actually release the mutex, and unschedule the thread until the condition is signaled. When the condition is signaled the thread is waken up and the mutex is re-taken. All this happen inside the pthread_cond_wait() function
我会在黑暗中大胆尝试,说你的工作线程花费了大量时间等待条件变量。 为了在代码主要受 CPU 限制的这种情况下获得良好的 CPU 性能,可以理解为使用面向任务的编程风格,将线程视为“池”,并使用队列结构将工作提供给他们。 他们应该花很少的时间从队列中拉出工作,而将大部分时间花在实际工作上。
现在的情况是,它们可能正在工作一段时间,然后通过信号量通知主线程它们已经完成。 在两个线程都完成对当前正在处理的帧的处理之前,主线程不会释放它们。
既然您使用的是C++,您是否考虑过使用Boost.Threads? 它使得处理多线程代码变得更加容易,并且 API 实际上有点类似于 pthreads,但是采用“现代 C++”的方式。
I'd take a wild stab in the dark and say your worker threads are spending lots of time waiting on the condition variable. To get good CPU performance in this kind of situation where your code is mostly CPU bound, it is understood to use a task oriented style of programming, where you treat the threads as a "pool" and you use a queue structure to feed work to them. They should spend a very small amount of time pulling work off the queue and most of their time doing the actual work.
What you have right now is a situation where they are probably doing work for a while, then notifying the main thread via the semaphore that they are done. The main thread will not release them until both threads have finished working on the frame they are currently processing.
Since you are using C++, have you considered using Boost.Threads? It makes working with multithreaded code much easier, and the API is actually kind of similar to pthreads, but in a "modern C++" kind of way.
我不是 pthreads 专家,但在我看来,以下代码是错误的:
引用 这篇文章
所以在我看来,您应该在
pthread_cond_wait
之后的代码块之后释放互斥锁。I'm no pthreads guru, but it seems to me that the following code is wrong:
To quote this article
so it seems to me that you should be releasing the mutex after the block of code follwing the
pthread_cond_wait
.