如何使用 Platform LSF blaunch 命令同时启动进程?
我很难弄清楚为什么我无法使用 LSF blaunch
命令并行启动命令:
for num in `seq 3`; do
blaunch -u JobHost ./cmd_${num}.sh &
done
错误消息:
Oct 29 13:08:55 2011 18887 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18885 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18884 3 7.04 lsb_launch(): Failed while executing tasks.
删除与号 (&
) 允许命令顺序执行,但我追求并行执行。
I'm having a hard time figuring out why I can't launch commands in parallel using the LSF blaunch
command:
for num in `seq 3`; do
blaunch -u JobHost ./cmd_${num}.sh &
done
Error message:
Oct 29 13:08:55 2011 18887 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18885 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18884 3 7.04 lsb_launch(): Failed while executing tasks.
Removing the ampersand (&
) allows the commands to execute sequentially, but I am after parallel execution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当在 bsub 上下文中执行时,单次调用 blaunch -u即可。将采用
并在
中指定的所有主机上并行运行它,只要这些主机位于工作的分配。您想要做的是使用 3 个单独的
blaunch
调用来运行 3 个单独的命令。我在文档中找不到它,但对最新版本的 LSF 的一些测试表明,此类作业中每个单独执行的任务都有一个唯一的任务 ID,存储在名为 LSF_PM_TASKID 的环境变量中。您可以在您的 LSF 版本中通过运行以下命令来验证这一点:现在,这与您的问题有什么关系?您希望通过
blaunch
并行运行 i=1,2,3 的./cmd_$i.sh
。为此,您可以编写一个脚本,我将其称为cmd.sh
,如下所示:现在,您可以将 for 循环替换为单次调用
blaunch
,如下所示:这将在“JobHost”文件中列出的每台主机上并行运行一个
cmd.sh
实例,每个实例都将运行 shell 脚本cmd_X。 sh
其中X
是该特定任务的$LSF_PM_TASKID
值。如果“JobHost”中正好有 3 个主机名,那么您将获得 3 个
cmd.sh
实例,这将依次导致cmd_1.sh
、各一个实例cmd_2.sh
和cmd_3.sh
When executed within the context of bsub, a single invocation of
blaunch -u <hostfile> <cmd>
will take<cmd>
and run it on all the hosts specified in<hostfile>
in parallel as long as those hosts are within the job's allocation.What you're trying to do is use 3 separate invocations of
blaunch
to run 3 separate commands. I can't find it in the documentation, but just some testing on a recent version of LSF shows that each individually executed task in such a job has a unique task ID stored for it in an environment variable called LSF_PM_TASKID. You can verify this in your version of LSF by running something like:Now, what does this have to do with your question? You want to run
./cmd_$i.sh
for i=1,2,3 in parallel throughblaunch
. To do this you can write a single script which I'll callcmd.sh
as follows:Now you can replace your for loop with a single invocation of
blaunch
like so:This will run one instance of
cmd.sh
on each host listed in the file 'JobHost' in parallel, each of these instances will run the shell scriptcmd_X.sh
whereX
is the value of$LSF_PM_TASKID
for that particular task.If there's exactly 3 hostnames in 'JobHost' then you will get 3 instances of
cmd.sh
which will in turn lead to one instance each ofcmd_1.sh
,cmd_2.sh
, andcmd_3.sh
您尝试过
nohup
吗?这可能有效:Have you tried
nohup
? This might work:blaunch
不能在bsub
提供的作业执行环境之外使用。我不知道如何处理为每个进程运行不同的命令,但请尝试以下操作:blaunch
is not to be used outside of the job execution environment provided bybsub
. I don't know how to handle running different commands for each process, but try something like: