当前位置：文江博客话题详情

周到的动态 CPU 负载管理

发布于 2024-12-09 14:28:44 字数 546 浏览 6 评论 0原文

我正在编写一个 CPU 密集型图像处理库。为了充分利用可用的 CPU，我可以检测计算机上的核心总数，并让我的库以该数量的线程运行。当我的库为每个核心分配一个线程时，它会使用 100% 可用处理器时间实现最佳性能。

当我的进程是唯一运行的 CPU 密集型进程时，上述方法效果很好。如果另一个 CPU 密集型进程正在运行，甚至是我自己的代码的另一个实例，那么操作系统只会为我们分配可用内核的一小部分，而我的库会运行太多线程，这既低效又不考虑其他进程。

所以我想找到一种方法来确定给定特定负载运行的“公平份额”线程数。例如，如果我的进程的两个实例在 8 核计算机上运行，则每个实例将运行 4 个线程。每个都需要一种根据机器负载波动动态调整线程数的方法。

所以，我的问题是：

是否有任何操作系统功能或第三方库允许我的进程动态调整线程数以使用其公平的 CPU 份额？

我的重点是 Windows，但也对非 Windows 解决方案感兴趣。

编辑：要明确的是，这是关于优化的。我试图通过运行适合我的 CPU 公平份额的最佳线程数来实现峰值效率。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掩耳倾听 2024-12-16 14:28:44

在我看来，应用程序不应该决定生成多少个线程。这是调用者应该知道的信息。在linux中，“-j”或“--jobs”参数被广泛使用（默认值：1）。

还可以设置处理任务的优先级吗？因此，如果调用者知道处理是关键任务，他可以增加 prio（知道可能会阻塞（整个）系统）。您的处理库永远不会知道这个图像的处理有多重要。
如果调用者不关心，则使用默认的低优先级，这不应该影响系统的其余部分。如果确实如此，您应该查看到底是什么阻止了系统（可能将图像文件写入硬盘，减少内存大小以防止交换，...）。如果你弄清楚了这一点，你就可以准确地优化这一点。

如果您以 (cpu-cores)*2 的低优先级到正常优先级开始处理，您的系统应该可以使用。没有人会想到这会杀死系统。

只是我的2分钱。

回复收藏 0 原文

缪败 2024-12-16 14:28:44

实际上这不是多线程的问题，而是同时执行多个程序的问题。这对于大多数 PC 操作系统来说是很困难的，因为它与分时的概念相冲突。

让我们假设一些工作流程。

假设我们有 8 个核心，并创建 8 个线程来喂养它们；好的，这很容易。接下来我们选择监控核心负载来汇总某个核心上运行的任务数量；好吧，这需要一些统计假设，例如在 Linux 上您可以获得 1/5/15 分钟的负载平均图表，但这是可以做到的。统计图表很清晰，现在我们可以看到有多少 CPU 密集型进程正在运行，例如，看到其他 3 个 CPU 密集型进程。

然后我们就进入正题了：我们必须让 3 个冗余线程进入睡眠状态，但是是哪 3 个呢？

通常我们任意选择 3 个线程，因为调度程序会自动安排其他 8 个 CPU 密集型线程。在某些情况下，我们显式地将高负载核心上的线程置于睡眠状态，将其他线程分配给某些低负载核心，然后让调度程序完成其余的事情。大多数调度策略还尝试“保持 CPU 缓存热”，这意味着它们倾向于禁止在内核之间传输线程。我们合理地期望 CPU 密集型线程可以利用核心缓存，因为其他进程被调度到 3 个拥挤的核心。一切看起来都不错。

然而，这在紧密同步的计算中可能会失败。在这种情况下，我们需要同时运行 5 个线程。这里的同时性意味着 5 个线程必须几乎同时获得 CPU 并运行。我不知道PC上是否有任何调度程序可以为我们做到这一点。在大多数低负载情况下，一切仍然工作正常，因为等待同时性的成本微不足道。但是当一个核心的负载很高，甚至我们的5个线程中的1个受到干扰时，有时我们会发现我们花费了很多生命周期来等待。

它可能有助于将您的程序安排为实时程序，但这不是一个完美的解决方案。从统计上看，当它获得更多的 CPU 控制优先级时，它会导致更宽的同时性时间窗口。我不得不说，这并不能保证。

Actually it's not a problem of multithreading but a problem of executing many programs simultaneously. This is hard on most PC's operating systems because it conflicts to the idea of time-sharing.

Let's assume some workflow.

Suppose we have 8 cores and we create 8 threads to feed them; ok, that's easy. Next we choose to monitor core loading to summary how many tasks running on a certain core; well, that needs some statistical assumptions, e.g on Linux you can get a 1/5/15-mins load average chart, but that could be done. The statistical chart is clear and now we get a plot about how many CPU-bound processes are running, say, seeing other 3 CPU-intensive processes.

Then we come to the point: we have to make 3 redundant threads to sleep, but which 3?

Usually we choose 3 threads arbitrarily because the scheduler arranges the other 8 CPU-bound threads automatically. In some cases, we explicitly put threads on high load cores to sleep, assign other threads to certain low load cores, and let the scheduler do the rest things. Most scheduling policies also try to "keep CPU cache hot", which means they tend to forbid transferring threads between cores. We reasonably expect our CPU-intensive threads can utilize the core cache since other processes are scheduled to the 3 crowded cores. Everything looks good.

However this could fail in tightly synchronized computation. In this scenario we need to run our 5 threads simultaneously. Simultaneity here means the 5 threads have to gain CPU and run at almost the same time. I don't know if there's any scheduler on PC could do this for us. In most low-load cases, things still work fine because costs to wait for simultaneity is trivial. But when the load of a core is high and even 1 of our 5 threads is disturbed, occasionally we'll find we spend many life cycles in waiting.

It may help to schedule your program as a real-time program but it's not a perfect solution. Statistically it leads to a wider time window for simultaneity when it gains more CPU control priority. I have to say, it's not guaranteed.

回复收藏 0 原文

~没有更多了~