在 slurm 上并行重复一项任务 100 次

发布于 2025-01-11 16:05:10 字数 665 浏览 2 评论 0原文

我是集群计算的新手，我想在 python 上重复一项实证实验 100 次。对于每个实验，我需要生成一组数据并解决一个优化问题，然后我想获得平均值。为了节省时间，我希望并行进行。例如，假设我可以使用 20 个核心，我只需要在每个核心上重复 5 次。

以下是我用于在单核上运行 test.py 脚本的 test.slurm 脚本示例：

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=1              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36

python test.py

如果我想在多个核上运行它，应该如何运行我相应地修改 slurm 文件吗？

原文

I am new to cluster computation and I want to repeat one empirical experiment 100 times on python. For each experiment, I need to generate a set of data and solve an optimization problem, then I want to obtain the averaged value. To save time, I hope to do it in parallel. For example, suppose I can use 20 cores, I only need to repeat 5 times on each core.

Here's an example of a test.slurm script that I use for running the test.py script on a single core:

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=1              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36

python test.py

If I want to run it in multiple cores, how should I modify the slurm file accordingly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夢归不見 2025-01-18 16:05:10

要在多个内核上运行测试，您可以使用 srun -n 选项。 -n 指定进程数后，需要启动。

srun -n 20 python test.py

srun 是 slurm 中的启动器。

或者您可以更改 slurm 文件中的ntasks、cpus-per-task。
slurm 文件将如下所示：

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=20              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36
python test.py

To run the test on multiple cores, you can use srun -n option. After -n specify the number of processes, you need to launch.

srun -n 20 python test.py

srun is the launcher in slurm.

Or you can change the ntasks, cpus-per-task in slurm file.
The slurm file will look like this:

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=20              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36
python test.py

回复收藏 0 原文

~没有更多了~