MPI - 为什么每个进程都需要存在 MPI_Reduce 的接收缓冲区？

发布于 2025-01-16 22:25:08 字数 1850 浏览 1 评论 0 原文

想象一下，您有 N 个 mpi 进程，每个进程都会生成一个大的浮点数向量。您想要在数学意义上对这些向量求和（即创建一个结果向量，其第 i 个条目对应于 N 个向量中第 i 个条目的总和。

您迭代该向量并结合使用 MPI_Reduce 和 MPI_SUM 来生成进程 0 上的结果向量。为了节省内存，您只在进程 0 上创建一个空结果向量。这似乎不起作用，空结果向量需要存在于任何进程上，这似乎非常浪费，所以我想。有一个解决方法？

这是我的代码示例：

'''

#include <mpi.h>
#include <vector>
#include <iostream>

int main(int argc, char* argv[]){

MPI_Init(&argc, &argv);
MPI_Comm comm = MPI_COMM_WORLD;

int rank;
MPI_Comm_rank(comm, &rank);


std::vector <double> myvec(3);
std::vector <double> myvec_sum;

// fill vector 
for(int i=0; i<myvec.size(); ++i){
    myvec[i] = i*rank;
}


if(rank == 0) myvec_sum.resize(3);      // only on rank 0 to save memory
    
for(int i=0; i<myvec_sum.size(); ++i){
        
    MPI_Reduce(&myvec[i], &myvec_sum[i], 1, MPI_DOUBLE, MPI_SUM, 0, comm);      // will dead lock 
                                                                                // (code will only work if the "if(rank==0)" above is removed)
}


// print results
if(rank==0){
    std::cout<<"Results:"<<"\n";
    for(int i=0; i<myvec_sum.size(); ++i){
            std::cout << "i=" << i << " :  " << myvec_sum[i] << std::endl;
    }
}


MPI_Finalize();

return 0;

}

编辑：更换

if(rank == 0) myvec_sum.resize(3); // 仅在rank 0上以节省内存
  
for(int i=0; i

和

if(rank == 0) myvec_sum.resize(3);
MPI_Reduce(&myvec[0], &myvec_sum[0], myvec.size(), MPI_DOUBLE, MPI_SUM, 0, comm);

工作没有问题。我仍然不太明白如果不是所有进程都具有正确大小的接收向量，为什么我的逐元素方法会失败......

原文

Imagine you have N mpi processes that each generates a large vector of floats. You want to sum these vectors in the mathematical sense (i.e. create a result vector whose i-th entry corresponds to the sum of the i-th entries in the N vectors.

You iterate through the vector and use MPI_Reduce in combination with MPI_SUM to generate the results vector on process 0. To save memory, you only create an empty results vector on process 0. It does not seem to work, the empty results vector needs to exist on any process. This seems to be extremely wasteful, so I imagine there is a workaround?

Here my code example:

'''

#include <mpi.h>
#include <vector>
#include <iostream>

int main(int argc, char* argv[]){

MPI_Init(&argc, &argv);
MPI_Comm comm = MPI_COMM_WORLD;

int rank;
MPI_Comm_rank(comm, &rank);


std::vector <double> myvec(3);
std::vector <double> myvec_sum;

// fill vector 
for(int i=0; i<myvec.size(); ++i){
    myvec[i] = i*rank;
}


if(rank == 0) myvec_sum.resize(3);      // only on rank 0 to save memory
    
for(int i=0; i<myvec_sum.size(); ++i){
        
    MPI_Reduce(&myvec[i], &myvec_sum[i], 1, MPI_DOUBLE, MPI_SUM, 0, comm);      // will dead lock 
                                                                                // (code will only work if the "if(rank==0)" above is removed)
}


// print results
if(rank==0){
    std::cout<<"Results:"<<"\n";
    for(int i=0; i<myvec_sum.size(); ++i){
            std::cout << "i=" << i << " :  " << myvec_sum[i] << std::endl;
    }
}


MPI_Finalize();

return 0;

}

Edit:
Replacing

if(rank == 0) myvec_sum.resize(3);        // only on rank 0 to save memory
  
for(int i=0; i<myvec_sum.size(); ++i){
      
  MPI_Reduce(&myvec[i], &myvec_sum[i], 1, MPI_DOUBLE, MPI_SUM, 0, comm);

}

with

if(rank == 0) myvec_sum.resize(3);
MPI_Reduce(&myvec[0], &myvec_sum[0], myvec.size(), MPI_DOUBLE, MPI_SUM, 0, comm);

worked without problems. I still don't quite understand why my element-wise approach fails if not all the processes have the correct sized receive vector...

分享到QQ

分享到微博