MPI共享记忆是我问题的好解决方案吗?

发布于 2025-01-26 06:02:51 字数 604 浏览 3 评论 0原文

我有几个过程,每个过程都计算一个全局矩阵的某些子矩阵。问题在于,子膜将重叠,通常它们不一定必须在全局矩阵内形成连续的块。而且,每个任务也可能具有多个子矩阵。

最后,为了获得我的最终矩阵,我需要通过考虑全局矩阵中的位置来执行这些子诊断的元素求和。

到目前为止,我正在执行以下操作:

  • 每个处理器都有自己的全局阵列(矩阵)的副本
  • ,然后计算该全局矩阵的子矩阵,并将元素添加到与全局阵列的本地副本中的正确位置
  • , MPI_ALLREDUCE我将在所有任务上获得最终的全局矩阵(这是我的最终结果的元素求和),

只要我的全局矩阵很小。但是,随着全球矩阵的本地副本的分配变得越来越昂贵,这很快就变成了内存瓶颈。

一个约束是我必须仅使用MPI解决此问题。 另一个约束是,之后我需要对该全局矩阵执行操作。在该全局矩阵的不同部分访问不同任务的地方。这些块与以前的子矩阵块不同。

我以某种方式沿MPI-3共享内存数组偶然发现。但是,我不确定这可能是我问题的最佳解决方案,因为几个过程必须同时添加重叠的本地阵列。但是,对于我的操作,每个过程也可以再次从该全局矩阵中读取。

我相对缺乏经验,如何解决这类问题,我会为任何形式的建议感到高兴。

谢谢!

I have several processes, each of them calculates certain sub-matrices of one global matrix. The problem is that the sub-matrices will overlap and in general they do not necessarily have to form a continuous block within the global matrix. Also each tasks might also have more than one sub-matrix.

Finally, in order to obtain my final matrix I need to perform an element wise summation of these sub-matrices by considering the position within the global matrix.

So far I am doing the following:

  • each processor has its own copy of the global array (matrix)
  • each processor then calculates a sub-matrix of that global matrix and adds the elements to the right position in the local copy of the global array
  • with mpi_allreduce I am obtaining the final global matrix synchronized over all the tasks (this is my element wise summation to obtain my final result)

This works reasonably well as long as my global matrix is small. However, this becomes quickly a memory bottleneck as allocating local copies of the global matrix becomes more and more expensive.

One constraint is that I have to solve this with MPI only.
Another constraint is that I need to perform operations on that global matrix afterwards. Where different task have access (this time read-only) different parts of that global matrix. The blocks are not the same as the sub-matrix blocks before.

I somehow stumbled along MPI-3 shared memory arrays. However, I am not sure if this might be the best solution for my problem as several processes have to add simultaneously small local arrays which overlap. However, for my operations afterwards, each process could also read from that global matrix again.

I am relatively inexperienced how to solve these kind of problems and I would be happy for any kind of suggestions.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文