什么可能导致我的程序在一段时间后无法使用所有核心?
我编写了一个程序,可以捕获并显示来自三个视频卡的视频。对于每一帧,我都会生成一个线程,将帧压缩为 Jpeg,然后将其放入队列中以写入磁盘。我还有其他线程从这些文件中读取并在自己的线程中对它们进行解码。通常这工作得很好,它是一个相当 CPU 密集型的程序,使用所有六个 CPU 核心的大约 70-80%。但过了一会儿,编码突然变慢,程序无法足够快地处理视频并开始丢帧。如果我检查 CPU 利用率,我会发现一个核心(通常是核心 5)不再执行太多操作。
发生这种情况时,我是否退出并重新启动程序并不重要。 CPU 5 的利用率仍然较低,并且程序会立即开始丢帧。删除所有保存的视频也没有任何效果。重新启动计算机是唯一有帮助的方法。哦,如果我将程序的亲和力设置为使用除半空闲核心之外的所有核心,它就会一直工作,直到另一个核心发生同样的情况为止。这是我的设置:
- AMD X6 1055T(凉爽和安静关闭)
- GA-790FX-UD5 主板
- 4Gig RAM 未联动 1333Mhz'
- Blackmagic Decklink DUO 采集卡 (x2)
- Linux - Ubuntu x64 10.10 内核 2.6.32.29
我的应用程序使用:
- libjpeg- Turbo
- posix 线程
- diallink api
- Qt
- 编写C/C++
- 所有库动态链接
在我看来,Linux 在内核上调度线程的方式似乎存在某种问题。或者有什么方法可以让我的程序变得非常糟糕,以至于重新启动程序无济于事?
感谢您的阅读,欢迎任何意见。我被困住了:)
I have written a program that captures and displays video from three video cards. For every frame I spawn a thread that compresses the frame to Jpeg and then puts it in queue for writing to disk. I also have other threads that read from these files and decodes them in their own threads. Usually this works fine, it's a pretty CPU intensive program using about 70-80 percent of all six CPU cores. But after a while the encoding suddenly slows down and the program can't handle the video fast enough and starts dropping frames. If I check the CPU utilization I can see that one core (usually core 5) is not doing much anymore.
When this happens, it doesn't matter if I quit and restart my program. CPU 5 will still have a low utilization and the program starts dropping frames immediately. Deleting all saved video doesn't have any effect either. Restarting the computer is the only thing that helps. Oh, and if I set the affinity of my program to use all but the semi-idling core, it works until the same happens to another core. Here is my setup:
- AMD X6 1055T (Cool & Quiet OFF)
- GA-790FX-UD5 motherboard
- 4Gig RAM unganged 1333Mhz'
- Blackmagic Decklink DUO capture cards (x2)
- Linux - Ubuntu x64 10.10 with kernel 2.6.32.29
My app uses:
- libjpeg-turbo
- posix threads
- decklink api
- Qt
- Written in C/C++
- All libraries linked dynamically
It seems to me like it would be some kind of problem with the way Linux schedules threads on the cores. Or is there some way my program can mess up so bad that it doesn't help to restart the program?
Thank you for reading, any and all input is welcome. I'm stuck :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,确保这不是您的程序 - 也许您正在遇到复杂的并发错误,即使您的程序架构不太可能发生这种情况,并且重新启动内核会有所帮助。我发现,通常情况下,事后调试是一个好方法。使用调试符号进行编译,当程序行为异常时使用 -SEGV 终止程序,并使用 gdb 检查核心转储。
First of all, make sure it's not your program - maybe you are running into a convoluted concurrency bug, even though it's not all that likely with your program architecture and the fact that restarting the kernel helps. I've found that, usually, a good way is a post-mortem debugging. Compile with debugging symbols, kill the program with -SEGV when it is behaving strangely, and examine the core dump with gdb.
当新的帧处理线程产生时,我会尝试选择一个核心循环 a 并将线程固定到该核心。统计线程运行的时间。如果这实际上是 Linux 调度程序中的一个错误 - 您的线程在任何核心上运行将花费大致相同的时间。如果核心实际上正忙于其他事情 - 固定到该核心的线程将获得更少的 CPU 时间。
I would try to choose a core round-robin a when new frame processing thread is spawned and pin the thread to this core. Keep statistics on how long it takes for the thread to run. If this in in fact a bug in Linux scheduler - your threads will take roughly the same time to run on any core. If the core is actually busy with something else - your threads pinned to this core will get less CPU time.