使用 MPI 在多个内核上运行

发布于 2024-11-09 03:58:21 字数 402 浏览 3 评论 0原文

我使用当前命令提交 MPI 作业: mpirun -np no.of 处理器文件名

我的理解是,上述命令让我提交到通过 MPI 通信的 4 个独立处理器。然而,在我们的设置中,每个处理器都有 4 个未利用的核心 。 我的问题如下:

  1. 是否可以从 MPI run 命令行提交作业以在同一节点或多个节点的多个核心上运行?如果是这样怎么办?

  2. 上述内容是否需要在代码中进行任何特殊注释/设置?通过阅读一些文献,我确实了解到内核之间的通信时间可能与处理器之间的通信时间不同,因此确实需要考虑问题如何分布......但是对于这个问题呢?还需要估计什么?

  3. 最后,传输的数据量是否有限制?总线可以发送/接收的数据量是否有限制?缓存有限制吗?

谢谢!

I use the present command to submit MPI jobs: mpirun -np no.of processors filename

My understanding is that the above command lets me submit to 4 independent processors that communicate via MPI. However, at our setup, each processor has 4 cores which go un-utilized
.
The questions I had are the following:

  1. Is it possible to submit a job to run on multiple cores on the same node or several nodes from the MPI run command line? If so how?

  2. Does the above require any special comments/set up within the code? I do understand from reading some literature that the communication time between cores could be different from between processors, so it does require some thinking about how the problem is distributed...but for that issue? What else does one need to estimate for?

  3. Finally, is there a limit on how much amount of data is transferred? Is there a limit on how much data the bus can send/receive? Is there a limitation on the cache?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

回首观望 2024-11-16 03:58:21

所以 1 是关于启动进程的问题,2+3 基本上是关于性能调整的问题。性能调整可能涉及对底层代码的大量工作,但您不需要修改一行代码来完成任何这些工作。

我从你的第一个问题中了解到,你想要修改启动的 MPI 进程的分布。这样做必然超出标准,因为它依赖于操作系统和平台;因此每个 MPI 实现都会有不同的方法来做到这一点。 OpenMPIMPICH2 允许您可以指定处理器的最终位置,因此您可以为每个插槽指定两个处理器等。

您不需要修改代码即可使其工作,但根据核心分布,会存在性能问题。一般来说,很难对此说太多,因为这取决于您的通信模式,但是,是的,处理器越“接近”,总的来说,通信速度就越快。

MPI 任务之间来回传输的数据总量没有指定的限制,但是有带宽限制(并且每条消息都有限制)。缓存大小是任意的。

So 1 is a question about launching processes, and 2+3 are questions about, basically, performance tuning. Performance tuning can involve substantial work on the underlying code, but you won't need to modify a line of code to do any of this.

What I understand from your first question is that you want to modify the distribution of the MPI processes launched. Doing this is necessarily outside the standard, because it's OS and platform dependant; so each MPI implementation will have a different way to do this. Recent versions of OpenMPI and MPICH2 allow you to specify where the processors end up, so you can specify two processors per socket, etc.

You do not need to modify the code for this to work, but there are performance issues depending on core distributions. It's hard to say much about this in general, because it depends on your communication patterns, but yes, the "closer" the processors are, the faster the communications will be, by and large.

There's no specified limit to the total volume of data that goes back and forth between MPI tasks, but yes, there are bandwidth limits (and there are limits per message). The cache size is whatever it is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文