强制 Linux 在共享 CPU 缓存的 CPU 核心上调度进程
现代 AMD CPU 由多个 CCX 组成。每个 CCX 都有一个单独的 L3 缓存。
可以设置进程关联来将进程限制为某些CPU核心。
有没有办法强制 Linux 在共享 L3 缓存的两个内核上调度两个进程(父进程线程和子进程),但仍让调度程序自由选择哪两个内核?
Modern AMD CPUs consist of multiple CCX. Each CCX has a separate L3 cache.
It's possible to set process affinity to limit a process to certain CPU cores.
Is there a way to force Linux to schedule two processes (parent process thread & child process) on two cores that share L3 cache, but still leave the scheduler free to choose which two cores?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
较新的 Linux 可能会为您执行此操作:Cluster- Linux 5.16 中引入了感知调度 - 支持调度决策受到某些内核共享资源这一事实的影响。
如果您手动选择一个 CCX,您可以为它们提供相同的关联掩码,以允许它们在该 CCX 中的任何核心上进行调度。
亲和性掩码可以设置多个位。
我不知道有什么方法可以让内核决定哪个CCX,然后将这两个任务调度到其中的内核。如果父级检查它当前正在哪个核心上运行,它可以设置一个掩码以包括包含它的 CCX 中的所有核心,假设您有一种方法来检测核心 # 的分组方式,以及应用它的函数。
不过,您要小心,如果您启动多个进程,每个进程都执行此操作,则不会导致某些 CCX 完全闲置。也许每秒,执行
top
或htop
执行的操作来检查每个核心的利用率,如果是的话重新平衡? (即将两个进程的亲和力掩码更改为不同 CCX 的核心)。或者可能将此功能放在正在调度的进程之外,因此有一个“主控制程序”来查看(并可能修改)它应该控制的一组任务的亲和力掩码。 (不是系统上的所有任务;这会浪费工作。)或者,如果它正在查看所有内容,则不需要对当前平均负载进行如此多的检查,只需计算计划在何处进行的任务即可。 (并假设它不知道的任务可以在任何 CCX 上选择任何空闲核心,例如守护程序或偶尔的编译作业。或者,如果所有核心都忙于它正在管理的作业,则至少可以公平竞争。)
显然这是对于大多数父/子进程没有帮助,只有那些通过共享内存(或者可能是管道,因为内核管道缓冲区实际上是共享内存)进行大量通信的进程。
确实,Zen CPU 在 CCX 内/之间具有不同的核心间延迟,并且共享 L3 带来的缓存命中效应也不同。 https://www.anandtech.com/show/16529/ amd-epyc-milan-review/4 对 Zen 3、2 插槽 Xeon Platinum 与 2 插槽 ARM 进行了一些微基准测试安培。
Newer Linux may do this for you: Cluster-Aware Scheduling Lands In Linux 5.16 - there's support for scheduling decisions to be influenced by the fact that some cores share resources.
If you manually pick a CCX, you could give them each the same affinity mask that allows them to schedule on any of the cores in that CCX.
An affinity mask can have multiple bits set.
I don't know of a way to let the kernel decide which CCX, but then schedule both tasks to cores within it. If the parent checks which core it's currently running on, it could set a mask to include all cores in the CCX containing it, assuming you have a way to detect how core #s are grouped, and a function to apply that.
You'd want to be careful that you don't end up leaving some CCXs totally unused if you start multiple processes that each do this, though. Maybe every second, do whatever
top
orhtop
do to check per-core utilization, and if so rebalance? (i.e. change the affinity mask of both processes to the cores of a different CCX). Or maybe put this functionality outside the processes being scheduled, so there's one "master control program" that looks at (and possibly modifies) affinity masks for a set of tasks that it should control. (Not all tasks on the system; that would be a waste of work.)Or if it's looking at everything, it doesn't need to do so much checking of current load average, just count what's scheduled where. (And assume that tasks it doesn't know about can pick any free cores on any CCX, like daemons or the occasional compile job. Or at least compete fairly if all cores are busy with jobs it's managing.)
Obviously this is not helpful for most parent/child processes, only ones that do a lot of communication via shared memory (or maybe pipes, since kernel pipe buffers are effectively shared memory).
It is true that Zen CPUs have varying inter-core latency within / across CCXs, as well as just cache hit effects from sharing L3. https://www.anandtech.com/show/16529/amd-epyc-milan-review/4 did some microbenchmarking on Zen 3 vs. 2-socket Xeon Platinum vs. 2-socket ARM Ampere.
进程的底层库函数支持设置 CPU 集掩码,这允许您定义进程可以在其上运行的一组核心。 pthreads 也有类似的东西。请参阅此手册页和这个命令行工具。
这是一篇关于 Linux 如何处理 NUMA 系统的非常有趣的文章。它基本上试图将代码和内存保持在一起,因此它已经预先准备好做你想做的事情,开箱即用。尽管我认为如果两个进程之间的交互是通过一个分配的共享内存而另一个最终只是“访问”(即在启动第二个进程时,内核不知道它将要访问),那么它可能会被愚弄。访问由一个单独进程分配的内存,该进程实际上放置在很远的核心上[用 NUMA 术语])。
我认为 CPU 集 显示出一些希望。该页面底部有将 shell 放入特定 CPU 组的示例。这可能是从该 shell 启动的任何后续进程都将保留在同一 CPU 集中的一种方式,而无需您专门为它们设置核心亲和性(我认为它们将从 shell 继承该亲和性)。您仍然可以根据集合中的 CPU 来定义 CPU 集,但只需执行一次。
The underlying library functions for processes support setting CPU set masks, which allows you to define a set of cores on which a process is elegible to run. There's the equivalent for pthreads. See this man page and this command line tool.
This is quite an intersting piece on how Linux treats NUMA systems. It basically tries to keep code and memory together, so it is already pre-disposed to doing what you want, out of the box. Though I think it might get fooled if the interaction between two processes is via, for example, shared memory that one allocates and the other ends up merely "accessing" (i.e. in starting the second process, the kernel doesn't know it's going to access memory allocated by a separate process that it's actually put on a core a long way away [in NUMA terms]).
I think CPU sets shows some promise. At the bottom of that page there's examples of putting a shell into a specific CPU set. This might be a way that any subsequent processes started from that shell will be kept within the same CPU set, without you having to specifically set core affinities for them (I think they'll inherit that from the shell). You'd still be defining the CPU set in terms of which CPUs are in the set, but doing it only once.