Slurm:SRUN内部的Sbatch被忽略 /跳过,谁能解释为什么?
我仍在探索如何使用Slurm调度程序,这次我真的被卡住了。以下批处理脚本不起作用:
#!/usr/bin/env bash
#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4
cd ~/RS1
for n in {1..4};
do
echo "Starting ${n}"
srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done
由于大多数sbatch选项都在批处理脚本内部,因此调用只是:'sbatch脚本
。 20466. out
Starting 1
Starting 2
Starting 3
Starting 4
我仔细检查了没有SRUN的命令,但无误。
我必须承认,我还负责Slurm调度程序配置本身。让我知道我是否可以尝试更改任何内容或需要更多信息。
I'm still exploring how to work with the Slurm scheduler and this time I really got stuck. The following batch script somehow doesn't work:
#!/usr/bin/env bash
#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4
cd ~/RS1
for n in {1..4};
do
echo "Starting ${n}"
srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done
Since most of the SBATCH options are inside the batch script the invocation is just: 'sbatch script.sh'
The slurm-20466.out only contains the four echo'ing outputs: cat slurm-20466.out
Starting 1
Starting 2
Starting 3
Starting 4
I double checked the command without srun and that works without errors.
I must confess I am also responsible for the Slurm scheduler configuration itself. Let me know if I could try to change anything or when more information is needed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以在后台启动
srun
命令并并行运行。但是您永远不会等待命令完成。因此,循环非常快速地运行,呼应“启动...”行,在后台启动
srun
命令,然后完成。之后,您的sbatch
-script已完成并成功终止,这意味着您的作业已完成。这样,您的分配将被撤销,并且您的srun
命令也被终止。您可能可以看到它们从sacct
开始。您需要指示批处理脚本在终止之前等待完成工作,以等待背景过程完成。为此,您只需要在脚本中添加一个等待命令:
You start your
srun
commands in the background to have them run in parallel. But you never wait for the commands to finish.So the loop runs through very quickly, echoes the "Starting ..." lines, starts the
srun
command in the background and afterwards finishes. After that, yoursbatch
-script is done and terminates successfully, meaning that your job is done. With that, your allocation is revoked and yoursrun
commands are also terminated. You might be able to see that they started withsacct
.You need to instruct the batch script to wait for the work to be done before it terminates, by waiting for the background processes to finish. To do that, you simply have to add a wait command in your script at the end: