SLRUM:当节点位于2个分区时,如何限制一个分区中特定节点的CPU数量?
事实上,我发现了一个与我非常相似的问题。唯一的区别是我的小集群中节点的CPU数量不同。 (类似的问题是这里)
例如,我的集群中的节点是:
- node1,36个CPU
- Node2,32个CPU
- Node3,24个CPU + 1个GPU
- Node4,16个CPU + 1 GPU
我有2个分区:cpu(所有节点)和gpu(node3,4)。
如何在node3和node4中保留4个CPU用于gpu分区?换句话说,如何配置使cpu分区包含node1和node2中的所有CPU,node3中的20个CPU和node4中的12个CPU?
(参数MaxCPUsPerNode不能满足我的需求。)
谢谢!
Actually, I found a very similar question to mine. The only difference is that the number of CPUs of the nodes in my small cluster are different. (The similar question is here)
For example, the nodes in my cluster are:
- node1, 36 CPUs
- node2, 32 CPUs
- node3, 24 CPUs + 1 GPU
- node4, 16 CPUs + 1 GPU
I have 2 partitions: cpu (all nodes) and gpu (node3,4).
How to leave 4 CPUs in node3 and node4 for gpu partition? In other word, how to configure so that cpu partition includes all CPUs in node1 and node2, 20 CPUs in node3 and 12 CPUs in node4?
(The parameter MaxCPUsPerNode doesn't meet my demand.)
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用消耗性可跟踪资源插件 (https://slurm.schedmd.com/cons_res.html )而不是默认的节点分配插件,您可以将
DefCpuPerGPU
设置为 4(请参阅有关设置此变量并在您的slurm.conf
文档位于:https://slurm.schedmd.com /cons_res.html#using_cons_tres)Using the consumable trackable resources plugin (https://slurm.schedmd.com/cons_res.html) instead of the default node allocation plugin, you can set
DefCpuPerGPU
to 4 (see details on setting this variable and enablingcons_tres
in yourslurm.conf
documentation here: https://slurm.schedmd.com/cons_res.html#using_cons_tres)我找到了有点笨拙的解决方案,但确实可以完成工作。我有一个群集,其节点具有不同的CPU。我需要一个可以使用大多数节点中的所有CPU,但只能使用另一个节点的CPU子集。据我所知,这种特定的描述是不可能用Slurm来完成的。
但是,如果我创建两个分区:
nodes = n1,n2,n3,n3,n4
nodes = n5 maxcpuspernode = 15
,然后用
- partition = mostnodes,limitedNode
提交作业,调度程序将首先在任何分区上安排作业。用 manpage :这不是一个完美的解决方案,但据我所知,这是目前可用的最佳解决方案。
I found a solution to this that is a little bit janky but it does get the job done. I have a cluster which has nodes that have different numbers of CPUs. I need a partition that can use all the CPUs from most of the nodes but only a subset of CPUs from another node. As far as I can tell, this specific description is impossible to accomplish with Slurm as it stands.
However, if I create two partitions:
Nodes=n1,n2,n3,n4
Nodes=n5 MaxCPUsPerNode=15
And then submit jobs with
--partition=mostnodes,limitednode
, the scheduler will schedule the job on whichever partition is first able to run the job. In the words of the manpage:This isn't a perfect solution but as far as I can tell it is the best solution that is available right now.