我可以递归地致电sbatch吗？

发布于 2025-02-10 06:17:17 字数 479 浏览 3 评论 0原文

我想运行一个运行并创建检查点文件的程序。然后，我想运行几个从该检查点开始的变体配置。

例如，如果我运行：

sbatch -n 1 -t 12:00:00 --mem=16g program.sh

和program.sh看起来像这样：

#!/bin/sh

./set_checkpoint

sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config1.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config2.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config3.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config4.sh

这是否实现了所需的效果？

原文

I want to run a program that runs and creates a checkpoint file. Then I want to run several variant configurations that all start from that checkpoint.

For example, if I run:

sbatch -n 1 -t 12:00:00 --mem=16g program.sh

And program.sh looks like this:

#!/bin/sh

./set_checkpoint

sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config1.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config2.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config3.sh
sbatch -n 1 -t 12:00:00 --mem=16g cpt_restore_config4.sh

Does this implement the desired effect?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

﹉夏雨初晴づ 2025-02-17 06:17:17

通常，这不需要。您可以在主要作业脚本中分配所需的所有资源，并使用srun将资源用于特定任务。这是一个基本示例。

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=2
#SBATCH --time=01:00:00

module load some_module
srun -n 4 -c 2 ./my_program arg1 arg2
srun -n 4 -c 2 ./my_other_program arg1 arg2

请注意，我们分配了8个CPU，并将4用于每个任务。在这里，两个srun任务将顺序运行。为了使其并行运行，您可以使用此技巧。

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=2
#SBATCH --time=01:00:00

srun -n 4 -c 2 ./my_program arg1 arg2 &
srun -n 4 -c 2 ./my_other_program arg1 arg2 &

wait

请记住，这在几种情况下可能无法使用。我建议使用logger并将stdout和stderr重定向到文件。这里是一个简单的示例。

另外，如果您的任务使用具有不同参数集的单个文件，我建议使用参数解析。在Python中，我通常使用 hydra /joblib_launcher/“ rel =“ nofollow noreferrer”> joblib 扩展。它为您提供了平行的功能。

In general this is not needed. You can allocate all the resources you want in main job script and use resources for specific task with srun. Here is a basic example.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=2
#SBATCH --time=01:00:00

module load some_module
srun -n 4 -c 2 ./my_program arg1 arg2
srun -n 4 -c 2 ./my_other_program arg1 arg2

Note that we allocated 8 CPUs and used 4 for each task. Here, the two srun tasks will run sequentially. To make it run in parallel, you can use this trick.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=2
#SBATCH --time=01:00:00

srun -n 4 -c 2 ./my_program arg1 arg2 &
srun -n 4 -c 2 ./my_other_program arg1 arg2 &

wait

Just keep in mind this might not work in several cases. I would suggest using a logger and redirect the STDOUT and STDERR to a file. Here is a simple example.

Alternatively, if your tasks are using a single file with different set of parameters, I suggest using argument parsing. In Python, I generally use Hydra's joblib extension. It gives you parallelism capability out of the box.

回复收藏 0 原文

~没有更多了~