如何通过 MPI 加速这个问题

发布于 2024-08-19 08:17:24 字数 3257 浏览 8 评论 0原文

(1).我想知道如何使用 MPI 加快下面代码循环中的耗时计算?

 int main(int argc, char ** argv)   
 {   
 // some operations           
 f(size);           
 // some operations         
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations          
 int i;           
 double * array =  new double [size];           
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }           
 // some operations using all elements in array           
 delete [] array;  
 }

如代码所示,我想在与MPI并行的部分之前和之后进行一些操作,但我不知道如何指定并行部分的开始和结束位置。

(2)我当前的代码正在使用OpenMP来加速计算。

 void f(int size)   
 {   
 // some operations           
 int i;           
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait          
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }          
 } 
 // some operations using all elements in array           
 }

我想知道如果我改用MPI,是否可以同时为OpenMP和MPI编写代码?如果可以的话,代码怎么写,怎么编译运行代码?

(3)我们的集群有三个版本的MPI:mvapich-1.0.1、mvapich2-1.0.3、openmpi-1.2.6。 他们的用法一样吗?特别是就我而言。 哪一款最适合我使用?

谢谢和问候!


更新:

我想更多地解释一下我的问题,即如何指定并行部分的开始和结束。在下面的玩具代码中,我想限制函数 f() 内的并行部分:

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}  

但是,运行输出不是预期的。并行部分之前和之后的 printf 部分已被每个进程执行,而不仅仅是主进程:

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End running!  
Leaving function f().  
End running!  

所以在我看来并行部分不限于 MPI_Init() 和 MPI_Finalize() 之间。

除了这个问题,我还希望有人能回答我的其他问题。谢谢!

(1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI?

 int main(int argc, char ** argv)   
 {   
 // some operations           
 f(size);           
 // some operations         
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations          
 int i;           
 double * array =  new double [size];           
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }           
 // some operations using all elements in array           
 delete [] array;  
 }

As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends.

(2) My current code is using OpenMP to speed up the comutation.

 void f(int size)   
 {   
 // some operations           
 int i;           
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait          
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }          
 } 
 // some operations using all elements in array           
 }

I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code?

(3) Our cluster has three versions of MPI: mvapich-1.0.1, mvapich2-1.0.3, openmpi-1.2.6.
Are their usage same? Especially in my case.
Which one is best for me to use?

Thanks and regards!


UPDATE:

I like to explain a bit more about my question about how to specify the start and end of the parallel part. In the following toy code, I want to limit the parallel part within function f():

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}  

However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End running!  
Leaving function f().  
End running!  

So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().

Besides this one, I am still hoping someone could answer my other questions. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

谁对谁错谁最难过 2024-08-26 08:17:24

快速编辑(因为我要么不知道如何留下评论,要么我还不允许留下评论)——3lectrologos 关于 MPI 程序的并行部分是不正确的。您不能在 MPI_Init 之前和 MPI_Finalize 之后执行串行工作并期望它实际上是串行的 - 它仍将由所有 MPI 线程执行。

我认为问题的一部分在于 MPI 程序的“并行部分”是整个程序。 MPI 将在您指定的每个节点上大约同时开始执行相同的程序(您的主函数)。 MPI_Init 调用只是为程序设置某些内容,以便它可以正确使用 MPI 调用。

我认为你想做的正确的“模板”(伪代码)是:

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);  
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    if (myid == 0) { // Do the serial part on a single MPI thread
        printf("Performing serial computation on cpu %d\n", myid);
        PreParallelWork();
    }

    ParallelWork();  // Every MPI thread will run the parallel work

    if (myid == 0) { // Do the final serial part on a single MPI thread
        printf("Performing the final serial computation on cpu %d\n", myid);
        PostParallelWork();
    }

    MPI_Finalize();  
    return 0;  
}  

Quick edit (because I either can't figure out how to leave comments, or I'm not allowed to leave comments yet) -- 3lectrologos is incorrect about the parallel part of MPI programs. You cannot do serial work before MPI_Init and after MPI_Finalize and expect it to actually be serial -- it will still be executed by all MPI threads.

I think part of the issue is that the "parallel part" of an MPI program is the entire program. MPI will start executing the same program (your main function) on each node you specify at approximately the same time. The MPI_Init call just sets certain things up for the program so it can use the MPI calls correctly.

The correct "template" (in pseudo-code) for what I think you want to do would be:

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);  
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    if (myid == 0) { // Do the serial part on a single MPI thread
        printf("Performing serial computation on cpu %d\n", myid);
        PreParallelWork();
    }

    ParallelWork();  // Every MPI thread will run the parallel work

    if (myid == 0) { // Do the final serial part on a single MPI thread
        printf("Performing the final serial computation on cpu %d\n", myid);
        PostParallelWork();
    }

    MPI_Finalize();  
    return 0;  
}  
叫思念不要吵 2024-08-26 08:17:24

MPI_Init(带有&argc 和&argv 参数。这是MPI 实现的要求)必须是MAIN 中第一个执行的语句。 Finalize 必须是最后执行的语句。

main() 将在 MPI 环境中的每个节点上启动。节点数、node_id 和主节点地址等参数可以通过 argc 和 argv 传递。

它的框架是:

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int numprocs; int myid; 

int main(int argc, char **argv)  
{  

MPI_Init(&argc, &argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  /* main process. user interaction is ONLY HERE */

    printf("%s\n", "Start running!");  

    MPI_Send ... requests with job
    /*may be call f in main too*/
    MPU_Reqv ... results..
    printf("%s\n", "End running!");  
}
else
{

  /* Slaves. Do sit here and wait a job from main process */
  MPI_Recv(.input..);  
  /* dispatch input by parsing it 
    (if there can be different types of work)
    or just do the work */    
  f(..)
  MPI_Send(.results..);  
}

MPI_Finalize();  

return 0;  
}  

The MPI_Init (with args of &argc and &argv. It is the requirement of MPI implementations) must be really the first executed statement of MAIN. And Finalize must be the very last executed statement.

main() will be started on every node in MPI environment. Parameters like number of nodes, node_id, and master node address may be passed via argc and argv.

It is framework:

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int numprocs; int myid; 

int main(int argc, char **argv)  
{  

MPI_Init(&argc, &argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  /* main process. user interaction is ONLY HERE */

    printf("%s\n", "Start running!");  

    MPI_Send ... requests with job
    /*may be call f in main too*/
    MPU_Reqv ... results..
    printf("%s\n", "End running!");  
}
else
{

  /* Slaves. Do sit here and wait a job from main process */
  MPI_Recv(.input..);  
  /* dispatch input by parsing it 
    (if there can be different types of work)
    or just do the work */    
  f(..)
  MPI_Send(.results..);  
}

MPI_Finalize();  

return 0;  
}  
掩饰不了的爱 2024-08-26 08:17:24

如果数组中的所有值都是独立的,那么它应该是可简单并行的。将数组拆分为大小大致相等的块,将每个块分配给一个节点,然后将结果编译回一起。

If all the values in the array are independent, then it should be trivially parallelizable. Split the array into chunks of roughly equal size, give each chunk to a node, and then compile the results back together.

东北女汉子 2024-08-26 08:17:24

从 OpenMP 迁移到集群的最简单方法是来自英特尔的“Cluster OpenMP”。

对于 MPI,您需要完全重写工作分配。

The easiest migration to cluster form OpenMP can be "Cluster OpenMP" from intel.

For MPI you need to completely rewrite dispatching of work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文