在集群上测试 MPI

发布于 2024-08-19 09:21:04 字数 7218 浏览 19 评论 0原文

我正在集群上学习 OpenMPI。这是我的第一个例子。我预计输出会显示来自不同节点的响应,但它们都来自同一节点node062。我只是想知道为什么以及如何从不同节点实际获取报告以显示 MPI 实际上正在将进程分发到不同节点?谢谢和问候!

ex1.c

/* test of MPI */  
#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

int main(int argc, char **argv)  
{  
char idstr[2232]; char buff[22128];  
char processor_name[MPI_MAX_PROCESSOR_NAME];  
int numprocs; int myid; int i; int namelen;  
MPI_Status stat;  

MPI_Init(&argc,&argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  
MPI_Get_processor_name(processor_name, &namelen);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{   
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d at node %s ", myid, processor_name);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

}  

ex1.pbs

#!/bin/sh  
#  
#This is an example script example.sh  
#  
#These commands set up the Grid Environment for your job:  
#PBS -N ex1  
#PBS -l nodes=10:ppn=1,walltime=1:10:00  
#PBS -q dque    

# export OMP_NUM_THREADS=4  

 mpirun -np 10 /home/tim/courses/MPI/examples/ex1  

编译并运行:

[tim@user1 examples]$ mpicc ./ex1.c -o ex1   
[tim@user1 examples]$ qsub ex1.pbs  
35540.mgt  
[tim@user1 examples]$ nano ex1.o35540  
----------------------------------------  
Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
Job ID:         35540.mgt  
Username:       tim  
Group:          Brown  
Nodes:          node062 node063 node169 node170 node171 node172 node174 node175  
node176 node177  
End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
----------------------------------------  
WE have 10 processors  
Hello 1 Processor 1 at node node062 reporting for duty  
Hello 2 Processor 2 at node node062 reporting for duty        
Hello 3 Processor 3 at node node062 reporting for duty        
Hello 4 Processor 4 at node node062 reporting for duty        
Hello 5 Processor 5 at node node062 reporting for duty        
Hello 6 Processor 6 at node node062 reporting for duty        
Hello 7 Processor 7 at node node062 reporting for duty        
Hello 8 Processor 8 at node node062 reporting for duty        
Hello 9 Processor 9 at node node062 reporting for duty  

----------------------------------------  
Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
Job ID:         35540.mgt  
Username:       tim  
Group:          Brown  
Job Name:       ex1  
Session:        15533  
Limits:         neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00  
Resources:      cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03  
Queue:          dque  
Account:  
Nodes:  node062 node063 node169 node170 node171 node172 node174 node175 node176  
node177  
Killing leftovers...  

End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
----------------------------------------

更新:

我想在单个 PBS 脚本中运行多个后台作业,以便这些作业可以同时运行。例如,在上面的示例中,我添加了另一个调用来运行 ex1 并将两次运行更改为 ex1.pbs 中的后台

#!/bin/sh  
#  
#This is an example script example.sh  
#  
#These commands set up the Grid Environment for your job:  
#PBS -N ex1  
#PBS -l nodes=10:ppn=1,walltime=1:10:00  
#PBS -q dque 

echo "The first job starts!"  
mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 &  
echo "The first job ends!"  
echo "The second job starts!"  
mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 &  
echo "The second job ends!" 

(1) 在使用先前编译的可执行文件 ex1 qsub 该脚本后,结果很好。

The first job starts!  
The first job ends!  
The second job starts!  
The second job ends!  
WE have 5 processors  
WE have 5 processors  
Hello 1 Processor 1 at node node063 reporting for duty        
Hello 2 Processor 2 at node node169 reporting for duty        
Hello 3 Processor 3 at node node170 reporting for duty        
Hello 1 Processor 1 at node node063 reporting for duty        
Hello 4 Processor 4 at node node171 reporting for duty        
Hello 2 Processor 2 at node node169 reporting for duty        
Hello 3 Processor 3 at node node170 reporting for duty        
Hello 4 Processor 4 at node node171 reporting for duty  

(2) 但是,我认为 ex1 的运行时间太快,并且可能两个后台作业没有太多运行时间重叠,而当我将相同的方式应用于我的实际项目时,情况并非如此。所以我在ex1.c中添加了sleep(30)来延长ex1的运行时间,这样在后台运行ex1的两个作业几乎总是同时运行。

/* test of MPI */  
#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  
#include <unistd.h>

int main(int argc, char **argv)  
{  
char idstr[2232]; char buff[22128];  
char processor_name[MPI_MAX_PROCESSOR_NAME];  
int numprocs; int myid; int i; int namelen;  
MPI_Status stat;  

MPI_Init(&argc,&argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  
MPI_Get_processor_name(processor_name, &namelen);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{   
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d at node %s ", myid, processor_name);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  

sleep(30); // new added to extend the running time
MPI_Finalize();  

}  

但重新编译并再次qsub后,结果似乎不太对劲。有进程被中止。 在ex1.o35571:

The first job starts!  
The first job ends!  
The second job starts!  
The second job ends!  
WE have 5 processors  
WE have 5 processors  
Hello 1 Processor 1 at node node063 reporting for duty  
Hello 2 Processor 2 at node node169 reporting for duty  
Hello 3 Processor 3 at node node170 reporting for duty  
Hello 4 Processor 4 at node node171 reporting for duty  
Hello 1 Processor 1 at node node063 reporting for duty  
Hello 2 Processor 2 at node node169 reporting for duty  
Hello 3 Processor 3 at node node170 reporting for duty  
Hello 4 Processor 4 at node node171 reporting for duty  
4 additional processes aborted (not shown)  
4 additional processes aborted (not shown)  

在ex1.e35571:

mpirun: killing job...  
mpirun noticed that job rank 0 with PID 25376 on node node062 exited on signal 15 (Terminated).  
mpirun: killing job...  
mpirun noticed that job rank 0 with PID 25377 on node node062 exited on signal 15 (Terminated).  

我想知道为什么有进程中止?如何在 PBS 脚本中正确执行 qsub 后台作业?

I am learning OpenMPI on a cluster. Here is my first example. I expect the output would show response from different nodes, but they all respond from the same node node062. I just wonder why and how I can actually get report from different nodes to show MPI actually is distributing processes to different nodes? Thanks and regards!

ex1.c

/* test of MPI */  
#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

int main(int argc, char **argv)  
{  
char idstr[2232]; char buff[22128];  
char processor_name[MPI_MAX_PROCESSOR_NAME];  
int numprocs; int myid; int i; int namelen;  
MPI_Status stat;  

MPI_Init(&argc,&argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  
MPI_Get_processor_name(processor_name, &namelen);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{   
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d at node %s ", myid, processor_name);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

}  

ex1.pbs

#!/bin/sh  
#  
#This is an example script example.sh  
#  
#These commands set up the Grid Environment for your job:  
#PBS -N ex1  
#PBS -l nodes=10:ppn=1,walltime=1:10:00  
#PBS -q dque    

# export OMP_NUM_THREADS=4  

 mpirun -np 10 /home/tim/courses/MPI/examples/ex1  

compile and run:

[tim@user1 examples]$ mpicc ./ex1.c -o ex1   
[tim@user1 examples]$ qsub ex1.pbs  
35540.mgt  
[tim@user1 examples]$ nano ex1.o35540  
----------------------------------------  
Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
Job ID:         35540.mgt  
Username:       tim  
Group:          Brown  
Nodes:          node062 node063 node169 node170 node171 node172 node174 node175  
node176 node177  
End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
----------------------------------------  
WE have 10 processors  
Hello 1 Processor 1 at node node062 reporting for duty  
Hello 2 Processor 2 at node node062 reporting for duty        
Hello 3 Processor 3 at node node062 reporting for duty        
Hello 4 Processor 4 at node node062 reporting for duty        
Hello 5 Processor 5 at node node062 reporting for duty        
Hello 6 Processor 6 at node node062 reporting for duty        
Hello 7 Processor 7 at node node062 reporting for duty        
Hello 8 Processor 8 at node node062 reporting for duty        
Hello 9 Processor 9 at node node062 reporting for duty  

----------------------------------------  
Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
Job ID:         35540.mgt  
Username:       tim  
Group:          Brown  
Job Name:       ex1  
Session:        15533  
Limits:         neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00  
Resources:      cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03  
Queue:          dque  
Account:  
Nodes:  node062 node063 node169 node170 node171 node172 node174 node175 node176  
node177  
Killing leftovers...  

End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
----------------------------------------

UPDATE:

I would like to run several background jobs in a single PBS script, so that the jobs can run at the same time. e.g. in the above example, I added another call to run ex1 and change both runs to be background in ex1.pbs

#!/bin/sh  
#  
#This is an example script example.sh  
#  
#These commands set up the Grid Environment for your job:  
#PBS -N ex1  
#PBS -l nodes=10:ppn=1,walltime=1:10:00  
#PBS -q dque 

echo "The first job starts!"  
mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 &  
echo "The first job ends!"  
echo "The second job starts!"  
mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 &  
echo "The second job ends!" 

(1) The result is fine after qsub this script with previous compiled executable ex1.

The first job starts!  
The first job ends!  
The second job starts!  
The second job ends!  
WE have 5 processors  
WE have 5 processors  
Hello 1 Processor 1 at node node063 reporting for duty        
Hello 2 Processor 2 at node node169 reporting for duty        
Hello 3 Processor 3 at node node170 reporting for duty        
Hello 1 Processor 1 at node node063 reporting for duty        
Hello 4 Processor 4 at node node171 reporting for duty        
Hello 2 Processor 2 at node node169 reporting for duty        
Hello 3 Processor 3 at node node170 reporting for duty        
Hello 4 Processor 4 at node node171 reporting for duty  

(2) However, I think the running time of ex1 is too quick and probably the two background jobs do not have much running time overlapping, which is not the case when I apply the same way to my real project. So I added sleep(30) to ex1.c to extend the running time of ex1 so that two jobs running ex1 in background will be running simultaneously almost all the time.

/* test of MPI */  
#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  
#include <unistd.h>

int main(int argc, char **argv)  
{  
char idstr[2232]; char buff[22128];  
char processor_name[MPI_MAX_PROCESSOR_NAME];  
int numprocs; int myid; int i; int namelen;  
MPI_Status stat;  

MPI_Init(&argc,&argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  
MPI_Get_processor_name(processor_name, &namelen);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{   
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d at node %s ", myid, processor_name);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  

sleep(30); // new added to extend the running time
MPI_Finalize();  

}  

But after recompilation and qsub again, the results seems not all right. There are processes aborted.
in ex1.o35571:

The first job starts!  
The first job ends!  
The second job starts!  
The second job ends!  
WE have 5 processors  
WE have 5 processors  
Hello 1 Processor 1 at node node063 reporting for duty  
Hello 2 Processor 2 at node node169 reporting for duty  
Hello 3 Processor 3 at node node170 reporting for duty  
Hello 4 Processor 4 at node node171 reporting for duty  
Hello 1 Processor 1 at node node063 reporting for duty  
Hello 2 Processor 2 at node node169 reporting for duty  
Hello 3 Processor 3 at node node170 reporting for duty  
Hello 4 Processor 4 at node node171 reporting for duty  
4 additional processes aborted (not shown)  
4 additional processes aborted (not shown)  

in ex1.e35571:

mpirun: killing job...  
mpirun noticed that job rank 0 with PID 25376 on node node062 exited on signal 15 (Terminated).  
mpirun: killing job...  
mpirun noticed that job rank 0 with PID 25377 on node node062 exited on signal 15 (Terminated).  

I wonder why there are processes aborted? How can I qsub background jobs correctly in a PBS script?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浮华 2024-08-26 09:21:04

几件事:
你需要告诉 mpi 在哪里启动进程,
假设您使用的是 mpich,请查看 mpiexec 帮助部分并查找机器文件或等效描述。除非提供机器文件,否则它将在一台主机上运行

PBS 自动创建节点文件。其名称存储在 PBS 命令文件中可用的 PBS_NODEFILE 环境变量中。尝试以下操作:

mpiexec -machinefile $PBS_NODEFILE ...

如果您使用 mpich2,则可以使用 mpdboot 两次引导 mpi 运行时。我不记得命令的详细信息,您必须阅读手册页。请记住创建秘密文件,否则 mpdboot 将失败。

我再次阅读了您的帖子,您将使用 open mpi,您仍然必须向 mpiexec 命令提供机器文件,但您不必弄乱 mpdboot

couple things:
you need to tell mpi where to launch processes,
assuming you are using mpich, look at mpiexec help section and find machine file or equivalent description. Unless machine file is provided, it will run on one host

PBS automatically creates nodes file. Its name is stored in PBS_NODEFILE environment variable available in PBS command file. Try the following:

mpiexec -machinefile $PBS_NODEFILE ...

if you are using mpich2, you have two boot your mpi runtime using mpdboot. I do not remember the details of command, you will have to read man page. Remember to create secret file otherwise mpdboot will fail.

I read your post again, you will use open mpi, you still have to supply machines file to mpiexec command, but you do not have to mess with mpdboot

季末如歌 2024-08-26 09:21:04

默认情况下,PBS(我假设是扭力)以独占模式分配节点,因此每个节点只有一项作业。如果您有多个处理器,很可能每个 CPU 一个进程,情况会有点不同。 PBS 可以更改为以分时模式分配 nod,请查看 qmgr 的手册页。长话短说,很可能节点文件中不会有重叠节点,因为节点文件是在资源可用时创建的,而不是在资源可用时创建的。提交。

PBS 的目的是资源控制,最常见的是时间、节点分配(自动)。

PBS 文件中的命令按顺序执行。您可以将流程放在后台,但这可能会违背资源分配的目的,但我不知道您的确切工作流程。我使用 PBS 脚本中的后台进程在主程序并行运行之前使用 & 复制数据。 PBS脚本实际上只是一个shell脚本。

您可以假设 PBS 对您的脚本的内部运作一无所知。您当然可以在 via 脚本中运行多个进程/线程。如果您这样做,则由您和您的操作系统以平衡的方式分配核心/处理器。如果您使用多线程程序,最可能的方法是为节点运行一个 mpi 进程,然后生成 OpenMP 线程。

如果您需要澄清,请告诉我

By default PBS (I am assuming torque) allocates nodes in exclusive mode, so that only one job per node. It is a bit different if you have multiple processors, most likely one process per CPU. PBS can be changed to allocate nod in time-sharing mode, look at man page of qmgr.long story short, most likely you will not have overlapping nodes in node file, since node file is created when resources are available rather than at time of submission.

the purpose of PBS is resource control, most commonly time, node allocation (automatic).

commands in PBS file are executed sequentially. You can put processes in background, but that might be defeating purpose of resource allocation, but I do not know your exact workflow. I used the background processes in PBS scripts to copy data before main program runs in parallel, using &. PBS script is actually just a shell script.

you can assume that PBS does not know anything about inner workings off your script. You can certainly run multiple processes/threads in the via script.if you do so, that is up to you and your operating system to allocate core/processors in balanced fashion. If you are on multithreaded program, most likely approach is to run one mpi process for node and then spawn OpenMP threads.

Let me know if you need clarifications

审判长 2024-08-26 09:21:04

您的代码中有一个与 mpich 无关的错误,您在两个循环中重用了 i 。

for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  

第二个 for 循环会把事情搞砸。

There is a bug in your code unrelated to mpich, you've reused i in your two loops.

for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  

The second for loop will mess things up.

瑾夏年华 2024-08-26 09:21:04

作为诊断,请尝试在调用 MPI_GET_PROCESSOR_NAME 之后立即插入这些语句。

printf("Hello, world.  I am %d of %d on %s\n", myid, numprocs, name);
fflush(stdout); 

如果所有进程都返回相同的节点 id,那么对我来说,这表明您不太了解作业管理系统和集群上发生了什么 - 也许 PBS 正在(尽管您显然另有说明)将所有 10 个一个节点上的进程(一个节点中有 10 个核心吗?)。

如果这产生不同的结果,这对我来说表明你的代码有问题,尽管对我来说看起来没问题。

As a diagnostic, try inserting these statements immediately after your call to MPI_GET_PROCESSOR_NAME.

printf("Hello, world.  I am %d of %d on %s\n", myid, numprocs, name);
fflush(stdout); 

If all processes return the same node id to that, it would suggest to me that you don't quite understand what is going on on the job management system and cluster -- perhaps PBS is (despite you apparently telling it otherwise) putting all 10 processes on one node (do you have 10 cores in a node ?).

If this produces different results, that suggests to me something wrong with your code, though it looks OK to me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文