如何减少程序的CPU占用率?
我编写了一个多线程程序,它通过大量浮点运算进行一些 CPU 繁重计算。 更具体地说,它是一个逐帧比较动画序列的程序。 即,对于动画 A 中的所有帧,它将动画 A 中的帧数据与动画 B 中的所有帧进行比较。我对不同的动画并行执行此密集操作,因此程序可以在 AB 对、BC 对和 CA 对上工作在平行下。 该程序使用 QtConcurrent 和“map”函数,该函数将带有运动的容器映射到函数上。 QtConcurrent 为我管理线程池,我正在使用 Intel 四核处理器,因此它会产生 4 个线程。
现在的问题是我的进程破坏了我的CPU。 使用率是 100% 恒定,如果我在足够大的动作集上运行我的程序(非分页区域中的页面错误),我实际上会遇到蓝屏死机。 我怀疑这是因为我的电脑超频了。 但是,这可能是因为我编写程序的方式所致吗? 我用来测试机器稳定性的一些非常密集的基准测试工具从未使我的电脑崩溃。 有什么方法可以控制我的程序如何使用 CPU 来减少负载? 或者也许我误解了我的问题?
I wrote a multi-threaded program which does some CPU heavy computation with a lot of floating point operations. More specifically, it's a program which compares animation sequences frame by frame. I.e. it compares frame data from animation A with all the frames in animation B, for all frames in animation A. I carry out this intensive operation for different animations in parallel, so the program can be working on A-B pair, B-C pair and C-A pair in parallel. The program is using QtConcurrent and a "map" function which maps a container with motions onto a function. QtConcurrent manages thread pool for me, I am working on Intel Quad Core processor so it spawns 4 threads.
Now, the problem is that my process destroys my CPU. The usage is 100% constant and I actually get a Blue Screen of Death if I run my program on a big enough set of motions (Page fault in non-paged area). I suspect that this is because my computer is overclocked. However, could this be because of the way I coded my program? Some very intensive benchamrking tools I used to test my machine's stability never crashed my PC. Is there any way to control how my program uses my CPU to reduce the load? Or perhaps I am misunderstanding my problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
人们很容易把责任归咎于硬件。 我建议您尝试在不同的系统上运行您的程序,看看使用相同的数据会产生什么结果。
可能你有一个错误。
It's all too easy to blame the hardware. I would suggest you try running your program on a different system and see how that turns out with the same data.
Probably you have a bug.
研究使用 SIMD 操作。 我认为在这种情况下你会想要 SSE。 它们通常是比并行化更好的第一步,因为它们更容易正确,并且为大多数线性代数类型的运算提供了相当大的提升。
一旦你使用 SIMD 获得了它,那么就考虑并行化。 听起来你也在猛烈地撞击CPU,所以你也许可以做一些睡眠而不是忙等待,并确保你正确地清理或重用线程。
Look into using SIMD operations. I think you'd want SSE in this case. They're often a better first step than parallelization as they are easier to get correct and provide a pretty hefty boost to most linear algebra types of operations.
Once you get it using SIMD, then look into parallelizing. It sounds like you're slamming the CPU also, so you could perhaps do with some sleeps instead of busy waits perhaps, and make sure you're cleaning up or reusing threads properly.
由于没有 BSOD 错误代码(对于查找很有用),因此很难帮助您解决此问题。
您可以尝试物理重新安装内存((将其取出并放入)。我和我认识的其他一些人曾经在一些需要此操作的机器上工作过。例如,我曾经尝试在一台机器上升级 OS X,并且它一直崩溃......最后我弹出内存并将其放回原处,一切都很好。
With the absence of the BSOD error code (useful for looking up) it is a bit harder to help you with this one.
You might try physically reseating your memory ((take it out and drop it in). I, and some others I know, have worked on a few machines where this was needed. For instance I once trying to upgrade OS X on a machine and it kept crashing... finally I popped the memory out and dropped it back in and everything was fine.
睡眠(1); 将使 CPU 使用率减少一半。 我在使用 CPU 密集型算法时遇到了同样的问题。
Sleep(1); will cut CPU usage in half. I ran into the same problem working with a CPU intensive algorithm.
如果您的处理器有两个或更多核心,您可以转到任务管理器并转到进程,右键单击程序名称,然后单击
设置关联性
,然后将程序设置为使用更少的核心。然后,执行您要求的操作将花费更长的时间,但会导致 CPU 使用率显着下降。
If your processor has two cores or more you can go to task manager and go to processes and right click on the program name and click
Set affinity
and set the program to use fewer cores.It will then take longer to do the actions you're asking but will cause a SIGNIFICANT decrease in CPU usage.
我认为蓝屏死机是内核内存区域损坏引起的。
所以使用多线程来执行并行操作不可能是这个原因。
如果您创建多个线程,每个线程都承载大量浮点运算,那么您的 CPU 利用率肯定会达到 100%。
如果您可以在每个线程中给予一些睡眠,以便其他进程有一些机会,那就更好了。
您也可以尝试降低线程的优先级。
I think blue screen of death is caused when kernel memory region gets corrupted.
So using multithreading to carry out parallel operations could not be the reason for this.
Well if you are creating multiple threads each carrying heavy floating point operations then definitely your CPU utilization will reach upto 100%.
It would be better if you can give some sleep in each thread so that other process get some chance.
You may also try to reduce the priority of threads.
如果在Windows平台上,在完成一些工作后调用一次函数来通知CPU您想要将CPU分配给其他进程。 调用 sleep 函数,如下所示:
Slepp ( 0 ) ;
If in Windows platform, put after some work one call to function to inform CPU you want to make the cpu to other processes. Make a call to sleep function like that :
Slepp ( 0 );
超频电脑可能会导致各种奇怪的问题。 如果您怀疑这是问题的根本原因,请尝试将其计时在合理的范围内,然后重试测试。
它也可能是某种相当奇怪的内存错误,您以某种方式损坏了 RAM,而 Windows(我猜操作系统,因为 BSOD)无法再恢复(非常不可能,但谁知道)。
我能想到的另一种可能性是,您的线程实现中出现了一些错误,导致窗口终止。
但首先,我会看看超频问题......
Overclocking PCs can lead to all sorts of strange problems. If you suspect that to be the root cause of your problem, try to clock it in reasonable ranges and retry your tests.
It could also be some sort of quite strange memory-bug where you corrupt your RAM in a way where Windows (I guess that OS, because of BSOD) cannot recover anymore (very unlikely, but who knows).
Another possibility I can think of is, that you've got some error in your threading-implementation which kills windows.
But at first, I'd look at the overclocking-issue...
这里有一些很好的答案。
我只想补充一点,从进行了大量性能调优的角度来看,除非每个线程都得到了积极的优化,否则它很可能有很大的周期减少空间。
与长途汽车比赛进行类比,有两种方法可以获胜:
根据我的经验,大多数最初编写的软件都远远没有达到最直接的效果路线,尤其随着软件变得越来越大。
正如 Kenneth Cochran 所说,要找到程序中浪费的周期,永远不要猜测。 如果你在没有证明某个问题存在的情况下就修复了它,那么你就是在猜测。
查找性能问题的流行方法是使用分析器。
但是,我经常这样做,我的方法是这样的: http://www .wikihow.com/Optimize-Your-Program%27s-Performance
There are some excellent answers here.
I would only add, from the perspective of having done lots of performance tuning, unless each thread has been optimized aggressively, chances are it has lots of room for cycle-reduction.
To make an analogy with a long-distance auto race, there are two ways to try to win:
In my experience, most software as first written is quite far from taking the most direct route, especially as the software gets large.
To find wasted cycles in your program, as Kenneth Cochran said, never guess. If you fix something without having proved that it is a problem, you are investng in a guess.
The popular way to find performance problems is to use profilers.
However, I do this a lot, and my method is this: http://www.wikihow.com/Optimize-Your-Program%27s-Performance
您所描述的操作类型已经是高度可并行的。 运行多个作业实际上可能会损害性能。 这样做的原因是因为任何处理器的缓存大小都是有限的,并且尝试并发执行的操作越多,每个线程所占的缓存份额就越小。
您还可以考虑使用 GPU 来吸收部分处理负载的选项。 对于大多数类型的视频转换,现代 GPU 比同类 CPU 的效率要高得多。
the kind of operation you've described is already highly parallelizable. Running more than one job may actually hurt performance. The reason for this is because the cache of any processor is of limited size, and the more you try to do concurrently, the smaller each thread's share of the cache becomes.
You might also look into the options using your GPU to soak up some of the processing load. Modern GPU's are vastly more efficient for most kinds of video transformation than CPU's of similar generations.
这绝对是可能的。 尝试将其设置为正常速度一段时间。
在用户模式下运行的程序不太可能导致 BSOD。
It's definitely possible. Try setting it to normal speed for a while.
A program running in user mode is very unlikely to cause a BSOD.
据猜测,我想说您运行的不是 3 核机器(或 4 核机器,假设使用率为 100%),如果您使用的线程多于内核,并行化会严重损害您的性能。 每个 CPU 核心只创建一个线程,无论您做什么,永远不要让不同线程同时访问数据。 大多数多核 CPU 中的缓存锁定算法绝对会降低您的性能。 在本例中,在处理 L 帧动画的 N 核 CPU 上,我将在帧 0-(L/N) 上使用线程 1,在帧 (L/N)-(2*L/N) 上使用线程 2。 .. 框架上的螺纹 N ((N-1)*L/N)-L。 按顺序执行不同的组合(AB、BC、CA),这样就不会破坏缓存,而且编码应该更简单。
作为旁注? 像这样的真实计算应该使用 100% CPU,这意味着它会尽可能快地运行。
At a guess, I would say you are not running of a 3-core machine (or 4, given 100% usage), and parallelizing will actively hurt your performance if you use more threads than cores. Make only one thread per CPU core, and whatever you do, never have data accessed by different threads at the same time. The cache-locking algorithms in most multi-core CPUs will absolutely slaughter your performance. In this case, on a N-core CPU processing L-frame animations, I would use thread 1 on frames 0-(L/N), thread 2 on frames (L/N)-(2*L/N), ... thread N on frames ((N-1)*L/N)-L. Do the different combinations (A-B, B-C, C-A) in sequence so you don't thrash your cache, also, it should be simpler to code.
As a side note? Real computation like this should be using 100% CPU, it means it's going as fast as it can.
超频是最有可能导致不稳定的原因。 对于任何 CPU 密集型算法,都会出现一些 CPU 抖动。 尽管超频,我还是会找到一个好的性能分析器来查找性能瓶颈。 永远不要猜测问题出在哪里。 您可能会花费数月的时间来优化对性能没有实际影响的东西,或者更糟糕的性能甚至可能会降低。
The overclocking is the most likely cause of the instability. With any CPU intensive algorithm there is going to be some CPU thrashing. The overclocking not withstanding, I would find a good performance profiler to find performance bottlenecks. Never guess where the problem is. You could spend months optimizing something that has no real affect on performance or worse performance could even decrease.