与slurm和奇异性不一致的行为不一致

发布于 2025-02-13 01:50:47 字数 1932 浏览 2 评论 0原文

我是使用Slurm向HPC提交工作的新手,并且面临一个我无法解决的特殊问题。

我有一个job.slurm文件,其中包含以下bash脚本

#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 1 # total number of nodes
#SBATCH --time=00:05:00 # Max execution time
#SBATCH --partition=partition-name
#SBATCH --output=/home/users/r/usrname/slurm-reports/slurm-%j.out

module load GCC/9.3.0 Singularity/3.7.3-Go-1.14 CUDA/11.0.2 OpenMPI/4.0.3

binaryPrecision=600 #Temporary number

while getopts i:o: flag
do
        case "${flag}" in
                i) input=${OPTARG}
                        ;;
                o) output=${OPTARG}
                        ;;
                *) echo "Invalid option: -$flag" ;;
        esac
done

mpirun --allow-run-as-root singularity exec --bind /home/users/r/usrname/scratch/points_and_lines/:/usr/local/share/sdpb/ sdpb_2.5.1.sif pvm2sdp $binaryPrecision /usr/local/share/sdpb/$input /usr/local/share/sdpb/$output                                                    

命令pvm2sdp只是某种特定类型的c ++可执行文件,可将XML文件转换为JSON文件。

如果我提交.slurm文件,则

sbatch ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

它可以很好地工作。但是,如果我使用SRUN提交了

srun ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

以下错误,

--------------------------------------------------------------------------
A call to mkdir was unable to create the desired directory:

  Directory: /scratch
  Error:     Read-only file system

Please check to ensure you have adequate permissions to perform
the desired operation.
--------------------------------------------------------------------------

则不知道为什么会发生这种情况以及如何解决问题。我也尝试安装/cratch,但这无法解决问题。

任何帮助将不胜感激,因为我需要在另一个包含多个其他MPI调用的SRUN中使用SRUN。

I am completely new to using SLURM to submit jobs to a HPC and I am facing a peculiar problem that I am not able to resolve.

I have a job.slurm file that contains the following bash script

#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 1 # total number of nodes
#SBATCH --time=00:05:00 # Max execution time
#SBATCH --partition=partition-name
#SBATCH --output=/home/users/r/usrname/slurm-reports/slurm-%j.out

module load GCC/9.3.0 Singularity/3.7.3-Go-1.14 CUDA/11.0.2 OpenMPI/4.0.3

binaryPrecision=600 #Temporary number

while getopts i:o: flag
do
        case "${flag}" in
                i) input=${OPTARG}
                        ;;
                o) output=${OPTARG}
                        ;;
                *) echo "Invalid option: -$flag" ;;
        esac
done

mpirun --allow-run-as-root singularity exec --bind /home/users/r/usrname/scratch/points_and_lines/:/usr/local/share/sdpb/ sdpb_2.5.1.sif pvm2sdp $binaryPrecision /usr/local/share/sdpb/$input /usr/local/share/sdpb/$output                                                    

The command pvm2sdp is just some specific kind of C++ executable that converts a XML file to a JSON file.

If I submit the .slurm file as

sbatch ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

it works perfectly. However, if I instead submit it using srun as

srun ./job.slurm -i /home/users/r/usrname/scratch/points_and_lines/xmlfile.xml -o /home/users/r/usrname/scratch/points_and_lines/jsonfile.json

I get the following error -

--------------------------------------------------------------------------
A call to mkdir was unable to create the desired directory:

  Directory: /scratch
  Error:     Read-only file system

Please check to ensure you have adequate permissions to perform
the desired operation.
--------------------------------------------------------------------------

I have no clue why this is happening and how I can go about resolving the issue. I tried to mount /scratch as well but that does not resolve the issue.

Any help would be greatly appreciated since I need to use the srun inside another .slurm file that contains multiple other MPI calls.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情深已缘浅 2025-02-20 01:50:47

我通常在salloc之后使用srun。假设我必须在GPU上运行Python文件。我将使用salloc分配计算节点。

salloc --nodes=1 --account=sc1901 --partition=accel_ai_mig --gres=gpu:2

然后,我使用此命令直接访问计算节点的外壳。

srun --pty bash

现在,您可以按照PC上的任何命令键入任何命令。您可以尝试nvidia-smi。您可以运行python文件python code.py

在您的情况下,您可以简单地手动加载模块,然后在srun -pty bash之后运行mpirun命令。您不需要工作脚本。

另一件事是,sbatchsrun是为每个HPC定制的,因此我们不能说出确切地说是什么阻止您运行这些命令。

在斯旺西大学,我们应该仅使用sbatch的作业脚本。看看我的大学的HPC教程

阅读此文章都知道两者之间的主要差异。

I generally use srun after salloc. Let's say I have to run a python file on a GPU. I will use salloc to allocate a compute node.

salloc --nodes=1 --account=sc1901 --partition=accel_ai_mig --gres=gpu:2

Then I use this command to directly access the shell of the compute node.

srun --pty bash

Now, you can type any command as would do on your pc. You can try nvidia-smi. You can run Python files python code.py.

In your case, you can simply load modules manually and then run your mpirun command after srun --pty bash. You don't need the job script.

One more thing, sbatch and srun are customised for each HPC, so we can't say what exactly is stopping you from running those commands.

At Swansea University, we are expected to use job scripts with the sbatch only. Have a look at my university's HPC tutorial.

Read this article to know the primary differences between both.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文