与slurm和奇异性不一致的行为不一致
我是使用Slurm向HPC提交工作的新手,并且面临一个我无法解决的特殊问题。
我有一个job.slurm文件,其中包含以下bash脚本
#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 1 # total number of nodes
#SBATCH --time=00:05:00 # Max execution time
#SBATCH --partition=partition-name
#SBATCH --output=/home/users/r/usrname/slurm-reports/slurm-%j.out
module load GCC/9.3.0 Singularity/3.7.3-Go-1.14 CUDA/11.0.2 OpenMPI/4.0.3
binaryPrecision=600 #Temporary number
while getopts i:o: flag
do
case "${flag}" in
i) input=${OPTARG}
;;
o) output=${OPTARG}
;;
*) echo "Invalid option: -$flag" ;;
esac
done
mpirun --allow-run-as-root singularity exec --bind /home/users/r/usrname/scratch/points_and_lines/:/usr/local/share/sdpb/ sdpb_2.5.1.sif pvm2sdp $binaryPrecision /usr/local/share/sdpb/$input /usr/local/share/sdpb/$output
命令pvm2sdp只是某种特定类型的c ++可执行文件,可将XML文件转换为JSON文件。
如果我提交.slurm文件,则
sbatch ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json
它可以很好地工作。但是,如果我使用SRUN提交了
srun ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json
以下错误,
--------------------------------------------------------------------------
A call to mkdir was unable to create the desired directory:
Directory: /scratch
Error: Read-only file system
Please check to ensure you have adequate permissions to perform
the desired operation.
--------------------------------------------------------------------------
则不知道为什么会发生这种情况以及如何解决问题。我也尝试安装/cratch
,但这无法解决问题。
任何帮助将不胜感激,因为我需要在另一个包含多个其他MPI调用的SRUN中使用SRUN。
I am completely new to using SLURM to submit jobs to a HPC and I am facing a peculiar problem that I am not able to resolve.
I have a job.slurm file that contains the following bash script
#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 1 # total number of nodes
#SBATCH --time=00:05:00 # Max execution time
#SBATCH --partition=partition-name
#SBATCH --output=/home/users/r/usrname/slurm-reports/slurm-%j.out
module load GCC/9.3.0 Singularity/3.7.3-Go-1.14 CUDA/11.0.2 OpenMPI/4.0.3
binaryPrecision=600 #Temporary number
while getopts i:o: flag
do
case "${flag}" in
i) input=${OPTARG}
;;
o) output=${OPTARG}
;;
*) echo "Invalid option: -$flag" ;;
esac
done
mpirun --allow-run-as-root singularity exec --bind /home/users/r/usrname/scratch/points_and_lines/:/usr/local/share/sdpb/ sdpb_2.5.1.sif pvm2sdp $binaryPrecision /usr/local/share/sdpb/$input /usr/local/share/sdpb/$output
The command pvm2sdp is just some specific kind of C++ executable that converts a XML file to a JSON file.
If I submit the .slurm file as
sbatch ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json
it works perfectly. However, if I instead submit it using srun as
srun ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json
I get the following error -
--------------------------------------------------------------------------
A call to mkdir was unable to create the desired directory:
Directory: /scratch
Error: Read-only file system
Please check to ensure you have adequate permissions to perform
the desired operation.
--------------------------------------------------------------------------
I have no clue why this is happening and how I can go about resolving the issue. I tried to mount /scratch
as well but that does not resolve the issue.
Any help would be greatly appreciated since I need to use the srun inside another .slurm file that contains multiple other MPI calls.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我通常在
salloc
之后使用srun
。假设我必须在GPU上运行Python文件。我将使用salloc
分配计算节点。然后,我使用此命令直接访问计算节点的外壳。
现在,您可以按照PC上的任何命令键入任何命令。您可以尝试
nvidia-smi
。您可以运行python文件python code.py
。在您的情况下,您可以简单地手动加载模块,然后在
srun -pty bash
之后运行mpirun命令。您不需要工作脚本。另一件事是,
sbatch
和srun
是为每个HPC定制的,因此我们不能说出确切地说是什么阻止您运行这些命令。在斯旺西大学,我们应该仅使用
sbatch
的作业脚本。看看我的大学的HPC教程。阅读此文章都知道两者之间的主要差异。
I generally use
srun
aftersalloc
. Let's say I have to run a python file on a GPU. I will usesalloc
to allocate a compute node.Then I use this command to directly access the shell of the compute node.
Now, you can type any command as would do on your pc. You can try
nvidia-smi
. You can run Python filespython code.py
.In your case, you can simply load modules manually and then run your mpirun command after
srun --pty bash
. You don't need the job script.One more thing,
sbatch
andsrun
are customised for each HPC, so we can't say what exactly is stopping you from running those commands.At Swansea University, we are expected to use job scripts with the
sbatch
only. Have a look at my university's HPC tutorial.Read this article to know the primary differences between both.