并行化C++使用MPI_SEND和MPI_RECV代码
我有一个并行的代码,但是我不明白它是否并行工作。 我有两个向量A和B,其元素是按适当类定义的矩阵。 由于向量中的矩阵不是原始类型,因此我无法通过MPI_Scatter将这些向量发送到其他等级,因此我必须使用MPI_SEND和MPI_RECV。此外,排名0仅具有协调角色:它将其发送到其他等级,它们应与之合作并在最后收集结果,但不参与计算。
练习的解决方案如下:
// rank 0 sends the blocks to the other ranks, which compute the local
// block products, then receive the partial results and prints the global
// vector
if (rank == 0)
{
// send data
for (unsigned j = 0; j < N_blocks; ++j)
{
int dest = j / local_N_blocks + 1;
// send number of rows
unsigned n = A[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, dest, 1, MPI_COMM_WORLD);
// send blocks
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
}
// global vector
std::vector<dense_matrix> C(N_blocks);
for (unsigned j = 0; j < N_blocks; ++j)
{
int root = j / local_N_blocks + 1;
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, root, 4, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix received(n,n);
// receive blocks
MPI_Recv(received.data(), n*n, MPI_DOUBLE, root, 5,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// store block in the vector
C[j] = received;
}
// print result
print_matrix(C);
}
// all the other ranks receive the blocks and compute the local block
// products, then send the results to rank 0
}
else
{
// local vector
std::vector<dense_matrix> local_C(local_N_blocks);
// receive data and compute products
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix local_A(n,n); dense_matrix local_B(n,n);
// receive blocks
MPI_Recv(local_A.data(), n*n, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(local_B.data(), n*n, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// compute product
local_C[j] = local_A * local_B;
}
// send local results
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// send number of rows
unsigned n = local_C[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, 0, 4, MPI_COMM_WORLD);
// send block
MPI_Send(local_C[j].data(), n*n, MPI_DOUBLE, 0, 5, MPI_COMM_WORLD);
}
}
我认为,如果local_n_blocks = n_blocks/(size -1);
与1不同,则变量dest
不更改在每个循环迭代中的价值。因此,在“发送循环”的第一次迭代之后,第二次排名0面对的
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
是必须等待操作local_c [j] = local_a * local_b
上一个j的local_b 已经完成在我看来,该代码似乎没有很好地平行。 你怎么认为?
I have a parallel code, but I don't understand if it works correctly in parallel.
I have two vectors A and B whose elements are matrices defined with a proper class.
Since the matrices in the vectors are not primitive type I can't send these vectors to other ranks through MPI_Scatter, so I have to use MPI_Send and MPI_Recv. Also, rank 0 has only a coordination role: it sends to the other ranks the blocks they should work with and collects the results at the end, but it does not participate to the computation.
The solution of the exercise is the following:
// rank 0 sends the blocks to the other ranks, which compute the local
// block products, then receive the partial results and prints the global
// vector
if (rank == 0)
{
// send data
for (unsigned j = 0; j < N_blocks; ++j)
{
int dest = j / local_N_blocks + 1;
// send number of rows
unsigned n = A[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, dest, 1, MPI_COMM_WORLD);
// send blocks
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
}
// global vector
std::vector<dense_matrix> C(N_blocks);
for (unsigned j = 0; j < N_blocks; ++j)
{
int root = j / local_N_blocks + 1;
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, root, 4, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix received(n,n);
// receive blocks
MPI_Recv(received.data(), n*n, MPI_DOUBLE, root, 5,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// store block in the vector
C[j] = received;
}
// print result
print_matrix(C);
}
// all the other ranks receive the blocks and compute the local block
// products, then send the results to rank 0
}
else
{
// local vector
std::vector<dense_matrix> local_C(local_N_blocks);
// receive data and compute products
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix local_A(n,n); dense_matrix local_B(n,n);
// receive blocks
MPI_Recv(local_A.data(), n*n, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(local_B.data(), n*n, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// compute product
local_C[j] = local_A * local_B;
}
// send local results
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// send number of rows
unsigned n = local_C[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, 0, 4, MPI_COMM_WORLD);
// send block
MPI_Send(local_C[j].data(), n*n, MPI_DOUBLE, 0, 5, MPI_COMM_WORLD);
}
}
In my opinion, if local_N_blocks= N_blocks / (size - 1);
is different from 1, the variable dest
doesn't change value at every loop iteration. So, after the first iteration of the "sending loop", the second time that rank 0 faces
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
it has to wait that the operation local_C[j] = local_A * local_B
of the previous j has been completed so the code doesn't seem to me well parallelized.
What do you think?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论