仅当两者都获得分配的资源时,才能运行2个Slurm作业
提交一项工作以持有4 GPU。第二个提交以获取接下来的4个GPU(在不同的节点上)。我如何确保两个作业同时运行,以最终同步(Pytorch DPP)。
拥有一个额外的脚本来检查可用资源可以解决问题,但是其他作业可能会优先,因为它们已经进入了队列,而不是等待...
我使用的特定分区不允许直接提供2个节点的请求。
我也知道- 依赖项
标志,但是这只能用作对第一个作业的完成检查。
One job is submitted to get hold of 4 GPUs. The second is submitted to get hold of the next 4 GPUs (on a different node). How can I ensure that both of the jobs run at the same time such that they eventually synchronise (Pytorch DPP).
Having an extra script to check the available resources does the trick, however other jobs might have priority because they have been in the queue, rather than waiting...
The particular partition I am using does not allow for a request of 2 nodes directly.
I am also aware of the --dependency
flag, however this can only be used as a completion check of the first job.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一个简单的答案是使用Slurm更加明确。
srun示例
工作将根据需要分配特定的通用资源来满足请求。如果工作被暂停,这些资源将无法被其他工作使用。
如上所述,可以使用srun命令的-gres选项将作业步骤从分配给作业的人分配给工作。默认情况下,将分配给作业的所有通用资源。如果需要,工作步骤可以明确指定与作业不同的通用资源计数。此设计选择是基于每个作业执行许多工作步骤的场景。如果默认情况下授予所有通用资源的访问权限,则某些作业步骤将需要明确指定零通用资源计数,我们认为这更令人困惑。可以分配特定的通用资源,这些资源将无法用于其他工作步骤。一个简单的示例如下所示。
flags解释了
另一个示例:
您可以通过BASH脚本进一步自动化此功能:
复杂但更好的答案...
多进程服务MPS是与CUDA编程接口兼容的实现变体。国会议员执行体系结构旨在让合作的多进程CUDA应用程序(通常用于MPI作业),在最新的NVIDIA GPU上使用Hyper-Q功能。 Hyper-Q允许在同一GPU上同时处理CUDA内核;当GPU计算能力不受单个申请流程的不足时,这可以提高性能。
默认情况下,CUDA MPS在用户可用的不同CUDA模块中包括。
对于多GPU MPI批处理作业,可以使用
CUDA MPS
的使用使用-C MPS
选项。但是,该节点必须通过- 独家
选项专门保留。通过默认GPU分区执行(具有40个物理内核和4个GPU的节点)仅使用一个节点:
mps_multi_gpu_mpi.slurm
通过sbatch命令提交脚本:
sbatch mps_multi_gpu_mpi.slurm
同样,您可以在GPU_P2分区的整个节点(带有24个物理内核和8 GPU的节点)上执行作业:
要小心,即使您仅使用节点的一部分,也必须在独家模式下保留。特别是,这意味着整个节点被发票。
我建议您通过加载相同的模块在相同的环境中编译和执行代码。在此示例中,我假设
executable_mps_multi_gpu_mpi
可执行文件在提交目录中找到,即输入sbatch命令的目录。计算输出文件,gpu_cuda_mps_multi_mpi< numero_job> out,也可以在提交目录中找到。它是在作业执行开始时创建的:在运行工作时进行编辑或修改它可能会破坏执行。
SLURM默认行为必须进行模块清除:当您启动
sbatch
时,将加载到环境中的任何模块将传递给已提交的工作,从而使工作取决于什么您以前做过。protip:要避免自动任务分布中的错误,我建议使用
srun
执行您的代码而不是mpirun
。这样可以保证符合您在提交文件中要求的资源规范的分布。Misc。
工作默认情况下,每个分区和按QoS(服务质量)在Slurm中定义了资源。您可以修改限制或指定其他分区和 /或QoS,如文档中所示,详细列出了分区和QoS。
那是详尽的,我希望有帮助!
The simple answer is to be more explicit with slurm.
srun examples
Jobs will be allocated specific generic resources as needed to satisfy the request. If the job is suspended, those resources do not become available for use by other jobs.
Job steps can be allocated generic resources from those allocated to the job using the --gres option with the srun command as described above. By default, a job step will be allocated all of the generic resources allocated to the job. If desired, the job step may explicitly specify a different generic resource count than the job. This design choice was based upon a scenario where each job executes many job steps. If job steps were granted access to all generic resources by default, some job steps would need to explicitly specify zero generic resource counts, which we considered more confusing. The job step can be allocated specific generic resources and those resources will not be available to other job steps. A simple example is shown below.
Flags explained
Another example:
You can further automate this with a bash script:
The complex but better answer...
The Multi-Process Service MPS is an implementation variant compatible with the CUDA programming interface. The MPS execution architecture is designed to let co-operative multi-process CUDA applications, generally for MPI jobs, use Hyper-Q functionalities on the very latest NVIDIA GPUs. Hyper-Q allows CUDA kernels to be processed simultaneously on the same GPU; this can improve performance when the GPU calculation capacity is underused by a single application process.
CUDA MPS is included by default in the different CUDA modules available to the users.
For a multi-GPU MPI batch job, the usage of
CUDA MPS
can be activated with the-C mps
option. However, the node must be exclusively reserved via the--exclusive
option.For an execution via the default gpu partition (nodes with 40 physical cores and 4 GPUs) using only one node:
mps_multi_gpu_mpi.slurm
Submit script via the sbatch command:
sbatch mps_multi_gpu_mpi.slurm
Similarly, you can execute your job on an entire node of the gpu_p2 partition (nodes with 24 physical cores and 8 GPUs) by specifying:
Be careful, even if you use only part of the node, it has to be reserved in exclusive mode. In particular, this means that the entire node is invoiced.
I recommend that you compile and execute your codes in the same environment by loading the same modules. In this example, I assume that the
executable_mps_multi_gpu_mpi
executable file is found in the submission directory, i.e. the directory in which the sbatch command is entered.The calculation output file, gpu_cuda_mps_multi_mpi<numero_job>.out, is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution.
The module purge is made necessary by the Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launch
sbatch
will be passed to the submitted job making the execution of your job dependent on what you have done previously.PROTIP: To avoid errors in the automatic task distribution, I recommend using
srun
to execute your code instead ofmpirun
. This guarantees a distribution which conforms to the specifications of the resources you requested in the submission file.Misc.
Jobs have resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits or specify another partition and / or QoS as shown in the documentation detailing the partitions and QoS.
That was exhaustive, I HOPE THAT HELPS!