PBS 批处理系统是否跨节点移动多个串行作业?

发布于 2024-10-27 07:28:28 字数 594 浏览 9 评论 0原文

如果我需要“并行”运行许多串行程序(因为问题很简单但耗时 - 我需要读取同一程序的许多不同数据集),如果我只使用一个节点,解决方案很简单 。我所做的就是在每个命令后继续提交带有“&”号的串行作业,例如在作业脚本中:

./program1 &
./program2 &
./program3 &
./program4

这自然会在不同的处理器上运行每个串行程序。这在登录服务器或独立工作站上运行良好,当然也适用于仅需要一个节点的批处理作业。

但是,如果我需要运行同一程序的 110 个不同实例来读取 110 个不同的数据集怎么办?如果我使用提交 110 个 ./program# 命令的脚本提交到多个节点(比如 14 个),批处理系统是否会在不同节点上的不同处理器上运行每个作业,或者会尝试在同一处理器上运行它们, 8核心节点?

我尝试使用简单的 MPI 代码来读取不同的数据,但会导致各种错误,110 个进程中约有 100 个成功,其他进程崩溃。我也考虑过作业数组,但我不确定我的系统是否支持它。

我已经在各个数据集上广泛测试了串行程序 - 没有运行时错误,并且我没有超出每个节点上的可用内存。

If I need to run many serial programs "in parallel" (because the problem is simple but time consuming - I need to read in many different data sets for the same program), the solution is simple if I only use one node. All I do is keep submitting serial jobs with an ampersand after each command, e.g. in the job script:

./program1 &
./program2 &
./program3 &
./program4

which will naturally run each serial program on a different processor. This works well on a login server or standalone workstation, and of course for a batch job asking for only one node.

But what if I need to run 110 different instances of the same program to read 110 different data sets? If I submit to multiple nodes (say 14) with a script which submits 110 ./program# commands, will the batch system run each job on a different processor across the different nodes, or will it try to run them all on the same, 8 core node?

I have tried to use a simple MPI code to read different data, but various errors result, with about 100 out of the 110 processes succeeding, and the others crashing. I have also considered job arrays, but I'm not sure if my system supports it.

I have tested the serial program extensively on individual data sets - there are no runtime errors, and I do not exceed the available memory on each node.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

呆° 2024-11-03 07:28:28

不,PBS 不会自动为您在节点之间分配作业。但这是一件很常见的事情,并且您有几种选择。

  • 最简单且在某些方面对您来说最有利的是将任务打包成 1 节点大小的块,并将这些包作为单独的作业提交。这将使您的工作更快开始; 1 节点作业通常比 14 节点作业调度得更快,只是因为调度中单节点大小的漏洞多于 14 个节点。如果所有作业花费的时间大致相同,那么这种方法效果特别好,因为这样划分就非常简单了。

  • 如果您确实想在一项工作中完成所有工作(例如,为了简化簿记),您可能有权也可能没有访问 pbsdsh 命令; 这里对此进行了很好的讨论。这使您可以在作业中的所有处理器上运行单个脚本。然后,您编写一个脚本来查询 $PBS_VNODENUM 以找出它是哪个 nnodes*ppn 作业,并运行适当的任务。

  • 如果不是 pbsdsh,Gnu parallel 是另一个可以极大地简化这些任务的工具。如果您熟悉的话,它就像 xargs,但会并行运行命令,包括在多个节点上。因此,您可以提交(比如说)14 节点作业,并让第一个节点运行 gnu 并行脚本。好处是,即使作业的长度并不相同,这也会为您进行调度。我们给系统上的用户使用 gnu 并行处理此类事情的建议是 这里。请注意,如果您的系统上没有安装 gnu parallel,并且由于某种原因您的系统管理员不会这样做,您可以在您的主目录中设置它,这不是一个复杂的构建。

No, PBS won't automatically distribute the jobs among nodes for you. But this is a common thing to want to do, and you have a few options.

  • Easiest and in some ways most advantagous for you is to bunch the tasks into 1-node sized chunks, and submit those bundles as individual jobs. This will get your jobs started faster; a 1-node job will normally get scheduled faster than a (say) 14 node job, just because there's more one-node sized holes in the schedule than 14. This works particularly well if all the jobs take roughly the same amount of time, because then doing the division is pretty simple.

  • If you do want to do it all in one job (say, to simplify the bookkeeping), you may or may not have access to the pbsdsh command; there's a good discussion of it here. This lets you run a single script on all the processors in your job. You then write a script which queries $PBS_VNODENUM to find out which of the nnodes*ppn jobs it is, and runs the appropriate task.

  • If not pbsdsh, Gnu parallel is another tool which can enormously simplify these tasks. It's like xargs, if you're familiar with that, but will run commands in parallel, including on multiple nodes. So you'd submit your (say) 14-node job and have the first node run a gnu parallel script. The nice thing is that this will do scheduling for you even if the jobs are not all of the same length. The advice we give to users on our system for using gnu parallel for these sorts of things is here. Note that if gnu parallel isn't installed on your system, and for some reason your sysadmins won't do it, you can set it up in your home directory, it's not a complicated build.

酒绊 2024-11-03 07:28:28

您应该考虑作业数组

简而言之,您在 shell 脚本中插入 #PBS -t 0-109 (其中范围 0-109 可以是您想要的任何整数范围,但您声明您有 110数据集)和扭矩将:

  • 运行脚本的 110 个实例,为每个实例分配您指定的资源(在带有 #PBS 标签的脚本中或在提交时作为参数)。
  • 为每个作业的环境变量 PBS_ARRAYID 分配一个 0 到 109 之间的唯一整数。

假设您可以访问代码中的环境变量,您只需告诉每个作业在编号为 PBS_ARRAYID 的数据集上运行即可。

You should consider job arrays.

Briefly, you insert #PBS -t 0-109 in your shell script (where the range 0-109 can be any integer range you want, but you stated you had 110 datasets) and torque will:

  • run 110 instances of your script, allocating each with the resources you specify (in the script with #PBS tags or as arguments when you submit).
  • assign a unique integer from 0 to 109 to the environment variable PBS_ARRAYID for each job.

Assuming you have access to environment variables within the code, you can just tell each job to run on data set number PBS_ARRAYID.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文