强制执行SRUN在单个插座上使用独家核心
我正在使用sbatch
,并且有一个带有2个插座的节点,每个节点有18个内核,总计36个内核。我正在启动4个脚本,每个脚本都有两个共享GPU的任务:
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
此类配置运行4次,可提供4 x 2 x 4 = 32分配的内核。如何确保每个独特的工作都有仅在单个插座中分配的独家CPU?换句话说,在某些情况下,分配工作的情况,例如CPU 0、1、22、33,因为它们被放置在两个不同的插座上,并且在查看CPU-Bind时,每个作业都应完全有4个CPU。
当然,我可以以某种方式使用CPU掩码,但是问题在于节点配置和作业数会有所不同,并且我不想为每种配置做到这一点。
我一直在看-cpu-bind = sockets
,但似乎并未分配独家处理器:
cpu-bind=MASK - mycomp, task 0 0 [83008]: mask 0xff set
cpu-bind=MASK - mycomp, task 1 1 [83009]: mask 0xff set
I'm using sbatch
and I have a node with 2 sockets, each having 18 cores, totalling 36 cores. I'm launching 4 scripts where each has two tasks that share a GPU:
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
Such configuration run 4 times gives 4 x 2 x 4 = 32 allocated cores. How to make sure every distinct job has exclusive cpus allocated only within a single socket? In other words, there cannot be a situation where job is allocated e.g. CPUs 0, 1, 22, 33 since they are placed on two different sockets, and each job should have exactly 4 cpus available when looking at cpu-bind.
Of course I could somehow play with cpu masks but the problem is that node configuration and number of jobs varies and I don't want to do it for every configuration.
I was looking at --cpu-bind=sockets
but it seems it does not allocate exclusive processors:
cpu-bind=MASK - mycomp, task 0 0 [83008]: mask 0xff set
cpu-bind=MASK - mycomp, task 1 1 [83009]: mask 0xff set
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Slurm允许更详细的处理器绑定。
sbatch
脚本序列应包含您所需的配置:可以通过对
srun
的其他参数进一步完善处理器的亲和力(Mpirun
>):超节点INFO
指定套接字:核心:线程分发,并且优先于上述3个指令cpu-bind = cod cpu-bind = cores
告诉srun
to to将过程与核心结合Slurm allows for a more detailed processor bindings. The
sbatch
script preamble should contain your desired configuration:Processor affinity can be further refined with additional arguments to
srun
(similar options exist formpirun
):extra-node-info
specifies the socket:core:thread distribution and has precedence over the above 3 directivescpu-bind=cores
tellssrun
to bind the processes to cores