SLRUM:当节点位于2个分区时,如何限制一个分区中特定节点的CPU数量?

发布于 2025-01-21 02:32:51 字数 517 浏览 5 评论 0原文

事实上,我发现了一个与我非常相似的问题。唯一的区别是我的小集群中节点的CPU数量不同。 (类似的问题是这里

例如,我的集群中的节点是:

  • node1,36个CPU
  • Node2,32个CPU
  • Node3,24个CPU + 1个GPU
  • Node4,16个CPU + 1 GPU

我有2个分区:cpu(所有节点)和gpu(node3,4)。

如何在node3和node4中保留4个CPU用于gpu分区?换句话说,如何配置使cpu分区包含node1和node2中的所有CPU,node3中的20个CPU和node4中的12个CPU?

(参数MaxCPUsPerNode不能满足我的需求。)

谢谢!

Actually, I found a very similar question to mine. The only difference is that the number of CPUs of the nodes in my small cluster are different. (The similar question is here)

For example, the nodes in my cluster are:

  • node1, 36 CPUs
  • node2, 32 CPUs
  • node3, 24 CPUs + 1 GPU
  • node4, 16 CPUs + 1 GPU

I have 2 partitions: cpu (all nodes) and gpu (node3,4).

How to leave 4 CPUs in node3 and node4 for gpu partition? In other word, how to configure so that cpu partition includes all CPUs in node1 and node2, 20 CPUs in node3 and 12 CPUs in node4?

(The parameter MaxCPUsPerNode doesn't meet my demand.)

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

家住魔仙堡 2025-01-28 02:32:51

使用消耗性可跟踪资源插件 (https://slurm.schedmd.com/cons_res.html )而不是默认的节点分配插件,您可以将 DefCpuPerGPU 设置为 4(请参阅有关设置此变量并在您的slurm.conf 文档位于:https://slurm.schedmd.com /cons_res.html#using_cons_tres

Using the consumable trackable resources plugin (https://slurm.schedmd.com/cons_res.html) instead of the default node allocation plugin, you can set DefCpuPerGPU to 4 (see details on setting this variable and enabling cons_tres in your slurm.conf documentation here: https://slurm.schedmd.com/cons_res.html#using_cons_tres)

素染倾城色 2025-01-28 02:32:51

我找到了有点笨拙的解决方案,但确实可以完成工作。我有一个群集,其节点具有不同的CPU。我需要一个可以使用大多数节点中的所有CPU,但只能使用另一个节点的CPU子集。据我所知,这种特定的描述是不可能用Slurm来完成的。

但是,如果我创建两个分区:

  1. mostnodes ,带有nodes = n1,n2,n3,n3,n4
  2. limited node ,带有nodes = n5 maxcpuspernode = 15

,然后用- partition = mostnodes,limitedNode提交作业,调度程序将首先在任何分区上安排作业。用 manpage

如果作业可以使用多个分区,请在
逗号单独的列表,最早的发行仪式将是
不用考虑分区名称排序(尽管
更高的优先级分区将首先考虑)。当工作是
启动,使用的分区名称将首先放在
作业记录分区字符串。

这不是一个完美的解决方案,但据我所知,这是目前可用的最佳解决方案。

I found a solution to this that is a little bit janky but it does get the job done. I have a cluster which has nodes that have different numbers of CPUs. I need a partition that can use all the CPUs from most of the nodes but only a subset of CPUs from another node. As far as I can tell, this specific description is impossible to accomplish with Slurm as it stands.

However, if I create two partitions:

  1. mostnodes, with Nodes=n1,n2,n3,n4
  2. limitednode, with Nodes=n5 MaxCPUsPerNode=15

And then submit jobs with --partition=mostnodes,limitednode, the scheduler will schedule the job on whichever partition is first able to run the job. In the words of the manpage:

If the job can use more than one partition, specify their names in a
comma separate list and the one offering earliest initiation will be
used with no regard given to the partition name ordering (although
higher priority partitions will be considered first). When the job is
initiated, the name of the partition used will be placed first in the
job record partition string.

This isn't a perfect solution but as far as I can tell it is the best solution that is available right now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文