如何使用 C++ 通过 Boost MPI 发送矩阵的列STL 向量?
我想使用 Boost MPI 发送以 STL 矢量形式存储的矩阵的多列,
vector < vector < double > > A ( 10, vector <double> (10));
而不将内容复制到某个缓冲区(因为计算时间在这里至关重要)。
我发现如何使用 MPI 来完成此操作。以下是如何将 10 x 10 矩阵的第 4、5 和 6 列从一个进程 (rank==0) 发送到另一个进程 (rank==1) 的示例代码。 (尽管我不知道为什么我必须在 MPI_Typ_vector 的第三个参数中添加“2”。有人知道为什么吗?)。
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
// fill matrices
vector< vector <float> >A(10, vector <float> (10));
vector< vector <float> >A_copy(10, vector <float> (10));
for (int i=0; i!=10; i++)
{
for (int j=0; j!=10; j++)
{
A[i][j]=j+ i*10;
A_copy[i][j]=0.0;
}
}
int dest=1;
int tag=1;
// define new type = two columns
MPI_Datatype newtype;
MPI_Type_vector(10, /* # column elements */
3, /* 3 column only */
10+2, /* skip 10 elements */
MPI_FLOAT, /* elements are float */
&newtype); /* MPI derived datatype */
MPI_Type_commit(&newtype);
if (rank==0)
{
MPI_Send(&A[0][4], 1, newtype, dest, tag, MPI_COMM_WORLD);
}
if (rank==1)
MPI_Status status;
MPI_Recv(&A_copy[0][4], 1, newtype, 0, tag, MPI_COMM_WORLD, &status);
}
MPI_Finalize();
在 Boost 网页上,他们声称 MPI_Type_vector “在 Boost.MPI 中自动使用”(http://www.boost.org/doc/libs/1_47_0/doc/html/mpi/tutorial.html#mpi.c_mapping)。
但我找不到如何详细执行此操作的示例。只知道如何使用Boost发送整个矩阵或每个元素。
提前谢谢你,
托比亚斯
I want to send multiple columns of a matrix stored as in STL vector form
vector < vector < double > > A ( 10, vector <double> (10));
without copying the content to some buffer (because computation time is crucial here) with Boost MPI.
I found out, how this could be done with MPI. Here is the example Code how to send the 4th, 5th and 6th column of a 10 by 10 matrix from one process (rank==0) to another (rank==1). (Even though I do not know why I have to add the '2' in the third argument of MPI_Typ_vector. Does anyone know why?).
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
// fill matrices
vector< vector <float> >A(10, vector <float> (10));
vector< vector <float> >A_copy(10, vector <float> (10));
for (int i=0; i!=10; i++)
{
for (int j=0; j!=10; j++)
{
A[i][j]=j+ i*10;
A_copy[i][j]=0.0;
}
}
int dest=1;
int tag=1;
// define new type = two columns
MPI_Datatype newtype;
MPI_Type_vector(10, /* # column elements */
3, /* 3 column only */
10+2, /* skip 10 elements */
MPI_FLOAT, /* elements are float */
&newtype); /* MPI derived datatype */
MPI_Type_commit(&newtype);
if (rank==0)
{
MPI_Send(&A[0][4], 1, newtype, dest, tag, MPI_COMM_WORLD);
}
if (rank==1)
MPI_Status status;
MPI_Recv(&A_copy[0][4], 1, newtype, 0, tag, MPI_COMM_WORLD, &status);
}
MPI_Finalize();
On the Boost webpage, they claim that MPI_Type_vector is "used automatically in Boost.MPI" (http://www.boost.org/doc/libs/1_47_0/doc/html/mpi/tutorial.html#mpi.c_mapping).
But I can not find an example how to do this in detail. In only know how to send the whole matrix or each element after another with Boost.
Thank you in advance,
Tobias
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我通过编写自己的类“列”并将其序列化解决了这个问题。下面是一个示例代码:
说明:“columns”类包含一个指向矩阵的指针和两个指示列开始和结束位置的数字。
使用以下代码告诉 boost-serialization 如何序列化这个“列”类:
然后填充矩阵“输入”
并初始化列类对象(现在包含指向矩阵“输入”的指针):
并发送它到另一个(子)进程
最后由
I solved the problem by writing my own class 'columns' and serialize it. Here is an example code:
Explanation: The 'columns'-class contains a pointer to the matrix and two numbers indicating where the columns start and end.
With the following code one tells boost-serialization how to serialize this 'columns'-class:
Then one fill the matrix 'input'
and initialize a columns-class object (which now contains a pointer to the matrix 'input'):
and send it to another (sub)process by
In the end it is received by
如果要对 A 执行大量列操作,也许您应该存储 A 转置而不是 A。这会将列放在连续的内存位置中。这意味着您可以使用 MPI_Send 发送列,而无需执行任何复制操作。此外,列操作将会更快。
If you are going to do lots of column operations on A, maybe you should store A transpose rather than A. This will put the columns in contiguous memory locations. This means you could send a column using MPI_Send without doing any copy operations. Additionally, column operations will be faster.