使用非常大的数组时出现 MPI 分段错误

发布于 2024-12-16 22:11:37 字数 3651 浏览 0 评论 0 原文

我正在尝试用 C++ 编写一个 MPI 程序，对一个非常大的数组的值进行求和。下面的代码在数组维度高达 100 万的情况下运行良好，但是当我尝试使用 1000 万个或更多元素执行时，我收到了分段错误。有人可以帮助我吗？谢谢

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[]) {
    double t0, t1, time; //variabili per il calcolo del tempo
int nprocs, myrank;
    int root=0;
long temp, sumtot, i, resto, svStartPos, dim, intNum;

//Dimensione del vettore contenente i valori da sommare
const long A_MAX=10000000;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);


    long vett[A_MAX];
long parsum[B_MAX];

long c=-1;
int displs[nprocs];
int sendcounts[nprocs];

//printf("B_MAX: %ld\n", B_MAX);

//Inviamo (int)(A_MAX/nprocs) elementi tramite una scatter, resto è il 
//numero di elementi restanti che verranno inviati tramite la scatterv
resto= A_MAX % nprocs;
//printf("Resto: %d\n", resto);

//Posizione da cui iniziare lo Scatterv
svStartPos = A_MAX - resto;
//printf("svStartPos: %d\n", svStartPos);

// numero di elementi per processore senza tener conto del resto 
dim= (A_MAX-resto)/nprocs; 
//printf("dim: %d\n", dim);

//Il processore 0 inizializza il vettore totale, del quale vogliamo
//calcolare la somma
if (myrank==0){
    for (i=0; i<A_MAX; i++)
        vett[i]=1;
}

//Ciascun processore inizializza il vettore locale del quale calcoleremo la
//somma parziale dei suoi elementi. tale somma parziale verrà utilizzata
//nell'operazione di reduce
for (i=0; i<B_MAX; i++)
    parsum[i]=-1;

//Ciascun processore inizializza i vettori sendcounts e displs necessari per
//l'operazione di scatterv
for (i=0; i<nprocs; i++){
    if (i<A_MAX-svStartPos){
        //Se il rank del processore è compreso tra 0 e resto ...
        sendcounts[i]=1;            //...verrà inviato 1 elemento di vett...
        displs[i]= svStartPos+i;    //...di posizione svStartPos+i
    }
    else {
        //se il rank del processore è > resto ...
        sendcounts[i]=0;            //...non verrà inviato alcun elemento
        displs[i]= A_MAX;           
    }
}

root = 0;    //Il processore master

sumtot = 0;  //Valore della domma totale degli elementi di vett
temp = 0;    //valore temporaneo delle somme parziali

MPI_Barrier(MPI_COMM_WORLD);

if (A_MAX>=nprocs){
   MPI_Scatter(&vett[dim*myrank], dim, MPI_LONG, &parsum, dim, MPI_LONG, 0, MPI_COMM_WORLD);
   printf("Processore: %d - Scatter\n", myrank);
}

//La scatterv viene effettuata solo dai processori che hanno il rank 
//0<myrank<resto
if (sendcounts[myrank]==1){       
   MPI_Scatterv(&vett,sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD); 
   parsum[B_MAX-1]=c; 
   printf("Processore: %d - effettuo la Scatterv\n", myrank);
} 

MPI_Barrier(MPI_COMM_WORLD);

if(myrank==0){
    t0 = MPI_Wtime(); //inizio conteggio tempo
}

for(i=0; i<B_MAX; i++){
    if (parsum[i]!=-1)     
        temp = temp + parsum[i]; //somma degli elementi 
}
printf("Processore: %d - Somma parziale: %ld\n", myrank, temp);

MPI_Barrier(MPI_COMM_WORLD);

//il risultato di somma di ogni processore viene mandato al root che somma 
//i risultati parziali
MPI_Reduce(&temp,&sumtot,1,MPI_LONG,MPI_SUM,root,MPI_COMM_WORLD);

MPI_Barrier(MPI_COMM_WORLD);

if(myrank==0){
    t1 = MPI_Wtime(); //stop al tempo

    //calcolo e stampa del tempo trascorso
    time = 1.e6 * (t1-t0);
    printf("NumProcessori: %d  Somma: %ld  Tempo: %f\n", nprocs, sumtot, time);

    //verifica del valore somma. Se è corretto sumtot è pari a 0.
    sumtot = sumtot - A_MAX;
    printf("Verifica: %ld\n", sumtot);
}

MPI_Finalize();

return 0;

}

原文

I'm trying to wrote a MPI program with C++, to sum the values of a very large array.
The code below works well with array dimension up to 1 million, but when I try to execute with 10 million elements or more, I receive a sigmentation error. Someone can help me? Thanks

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[]) {
    double t0, t1, time; //variabili per il calcolo del tempo
int nprocs, myrank;
    int root=0;
long temp, sumtot, i, resto, svStartPos, dim, intNum;

//Dimensione del vettore contenente i valori da sommare
const long A_MAX=10000000;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);


    long vett[A_MAX];
long parsum[B_MAX];

long c=-1;
int displs[nprocs];
int sendcounts[nprocs];

//printf("B_MAX: %ld\n", B_MAX);

//Inviamo (int)(A_MAX/nprocs) elementi tramite una scatter, resto è il 
//numero di elementi restanti che verranno inviati tramite la scatterv
resto= A_MAX % nprocs;
//printf("Resto: %d\n", resto);

//Posizione da cui iniziare lo Scatterv
svStartPos = A_MAX - resto;
//printf("svStartPos: %d\n", svStartPos);

// numero di elementi per processore senza tener conto del resto 
dim= (A_MAX-resto)/nprocs; 
//printf("dim: %d\n", dim);

//Il processore 0 inizializza il vettore totale, del quale vogliamo
//calcolare la somma
if (myrank==0){
    for (i=0; i<A_MAX; i++)
        vett[i]=1;
}

//Ciascun processore inizializza il vettore locale del quale calcoleremo la
//somma parziale dei suoi elementi. tale somma parziale verrà utilizzata
//nell'operazione di reduce
for (i=0; i<B_MAX; i++)
    parsum[i]=-1;

//Ciascun processore inizializza i vettori sendcounts e displs necessari per
//l'operazione di scatterv
for (i=0; i<nprocs; i++){
    if (i<A_MAX-svStartPos){
        //Se il rank del processore è compreso tra 0 e resto ...
        sendcounts[i]=1;            //...verrà inviato 1 elemento di vett...
        displs[i]= svStartPos+i;    //...di posizione svStartPos+i
    }
    else {
        //se il rank del processore è > resto ...
        sendcounts[i]=0;            //...non verrà inviato alcun elemento
        displs[i]= A_MAX;           
    }
}

root = 0;    //Il processore master

sumtot = 0;  //Valore della domma totale degli elementi di vett
temp = 0;    //valore temporaneo delle somme parziali

MPI_Barrier(MPI_COMM_WORLD);

if (A_MAX>=nprocs){
   MPI_Scatter(&vett[dim*myrank], dim, MPI_LONG, &parsum, dim, MPI_LONG, 0, MPI_COMM_WORLD);
   printf("Processore: %d - Scatter\n", myrank);
}

//La scatterv viene effettuata solo dai processori che hanno il rank 
//0<myrank<resto
if (sendcounts[myrank]==1){       
   MPI_Scatterv(&vett,sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD); 
   parsum[B_MAX-1]=c; 
   printf("Processore: %d - effettuo la Scatterv\n", myrank);
} 

MPI_Barrier(MPI_COMM_WORLD);

if(myrank==0){
    t0 = MPI_Wtime(); //inizio conteggio tempo
}

for(i=0; i<B_MAX; i++){
    if (parsum[i]!=-1)     
        temp = temp + parsum[i]; //somma degli elementi 
}
printf("Processore: %d - Somma parziale: %ld\n", myrank, temp);

MPI_Barrier(MPI_COMM_WORLD);

//il risultato di somma di ogni processore viene mandato al root che somma 
//i risultati parziali
MPI_Reduce(&temp,&sumtot,1,MPI_LONG,MPI_SUM,root,MPI_COMM_WORLD);

MPI_Barrier(MPI_COMM_WORLD);

if(myrank==0){
    t1 = MPI_Wtime(); //stop al tempo

    //calcolo e stampa del tempo trascorso
    time = 1.e6 * (t1-t0);
    printf("NumProcessori: %d  Somma: %ld  Tempo: %f\n", nprocs, sumtot, time);

    //verifica del valore somma. Se è corretto sumtot è pari a 0.
    sumtot = sumtot - A_MAX;
    printf("Verifica: %ld\n", sumtot);
}

MPI_Finalize();

return 0;

}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对不⑦ 2024-12-23 22:11:37

我发现的第一个真正的错误是这一行：

MPI_Scatterv(&vett,sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD);

它将 ::std::vector 的地址传递给需要该参数中的 void* 的函数。任何指针类型（例如 ::std::vector*）到 void* 的转换都允许作为隐式转换，因此不会出现编译错误这一点。但是， MPI_Scatterv 期望其第一个参数是发送缓冲区的地址，MPI 期望它是一个普通数组。

我猜您最近从注释掉的部分更改了代码，其中 vett 是一个数组，并尝试通过在 MPI_Scatterv 中添加 address-of 运算符来使您的调用正常工作> 打电话。原始数组可能在某个时刻导致段错误，因为它是堆栈分配的，并且您用完了这些怪物的堆栈空间（Linux 系统上的默认堆栈大小约为兆字节 iirc，这完全符合该假设 - 测试这一点ulimit -s）。

对 ::std::vector 的更改导致实际数据被放置在堆上，而堆的最大大小要大得多（在 64 位系统上，您可能会用完物理内存更早）。实际上，您已经在前面几行实现了特定问题的解决方案：

MPI_Scatter(&vett[dim*myrank], dim, MPI_LONG, &parsum, dim, MPI_LONG, 0, MPI_COMM_WORLD);

在这里，您访问一个元素，然后获取其地址（请注意 [] 比 & 绑定更紧密）。只要您不修改底层向量，就可以。如果您只是将该解决方案应用于之前的调用，您就可以很容易地解决这个问题：

MPI_Scatterv(&vett[0],sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD);

在任何情况下，除了两个 vector 对象之外，您的代码看起来像是为旧的 C 标准编写的，不是 C++ - 例如，您可能会考虑查看诸如 新运算符系列而不是 malloc.h，您可以将变量声明与其定义保持一致（甚至在 for 循环标头！），使用 ostream cout 而不是 printf...

The first real error I found was this line:

MPI_Scatterv(&vett,sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD);

Which passes the address of an ::std::vector<int> to a function that expects a void* in that argument. The conversion of any pointer type (like ::std::vector<int>*) to void* is allowed as an implicit conversion, so there are no compile errors at this point. However, MPI_Scatterv expects its first argument to be the address of the send buffer, which MPI expects to be a normal array.

I guess that you changed your code recently from the commented out sections, where vett is an array and tried to get your call to work by adding the address-of operator in your MPI_Scatterv call. The original array probably caused segfaults at some point since it was stack-allocated and you ran out of stack space with those monsters (default stack size on linux systems is on the order of megabytes iirc, which would exactly fit that assumption - test this with ulimit -s).

The change to ::std::vector<int> caused the actual data to be placed on the heap instead, which has a much larger maximum size (and on 64 bit systems you can expect to run out of physical memory much earlier). You actually already implemented a solution to your particular problem a few lines earlier:

MPI_Scatter(&vett[dim*myrank], dim, MPI_LONG, &parsum, dim, MPI_LONG, 0, MPI_COMM_WORLD);

Here, you access an element and then take its address (note that [] binds tighter than &). This is O.K. as long as you do not modify the underlying vector. If you just apply that solution to the previous call, you can solve this problem quite easily:

MPI_Scatterv(&vett[0],sendcounts,displs,MPI_LONG,&c,1,MPI_LONG,0,MPI_COMM_WORLD);

In any case, except for the two vector objects, your code looks like it was written for the old C standard, not C++ - for example you might consider having a look into things like the new operator family instead of malloc.h, you can put your variable declarations in line with their definitions (even inside for loop headers!), ease your life with using the ostream cout instead of printf...