OpenMP 和 MPI 哪个更容易学习和调试?
我有一个数字处理 C/C++ 应用程序。它基本上是不同数据集的主循环。我们可以访问具有 openmp 和 mpi 的 100 个节点集群。我想加快应用程序的速度,但我对 mpi 和 openmp 来说都是绝对的新手。我只是想知道即使性能不是最好的,最容易学习和调试的是什么。
我还想知道什么最适合我的主循环应用程序。
谢谢
I have a number crunching C/C++ application. It is basically a main loop for different data sets. We got access to a 100 node cluster with openmp and mpi available. I would like to speedup the application but I am an absolut newbie for both mpi and openmp. I just wonder what is the easiest one to learn and to debug even if the performance is not the best.
I also wonder what is the most adequate for my main loop application.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您的程序只是一个大循环,使用 OpenMP 可以像编写一样简单:
OpenMP 仅对共享内存编程有用,除非您的集群正在运行类似 kerrighed 表示使用 OpenMP 的并行版本一次最多只能在一个节点上运行。
MPI 基于消息传递,入门稍微复杂一些。优点是您的程序可以同时在多个节点上运行,并在需要时在它们之间传递消息。
鉴于您所说的“针对不同的数据集”,听起来您的问题实际上可能属于“令人尴尬的并行”类别,只要您有超过 100 个数据集,您只需设置调度程序来为每个节点运行一个数据集直到它们全部完成,无需修改代码,并且比仅使用单个节点速度提高了近 100 倍。
例如,如果您的集群使用Condor作为调度程序,那么您可以将每个数据项1个作业提交到“vanilla”宇宙,仅改变作业描述的“Arguments =”行。 (对于Condor还有其他方法可以做到这一点,这可能更明智,并且对于扭矩、sge等也有类似的方法。)
If your program is just one big loop using OpenMP can be as simple as writing:
OpenMP is only useful for shared memory programming, which unless your cluster is running something like kerrighed means that the parallel version using OpenMP will only run on at most one node at a time.
MPI is based around message passing and is slightly more complicated to get started. The advantage is though that your program could run on several nodes at one time, passing messages between them as and when needed.
Given that you said "for different data sets" it sounds like your problem might actually fall into the "embarrassingly parallel" category, where provided you've got more than 100 data sets you could just setup the scheduler to run one data set per node until they are all completed, with no need to modify your code and almost a 100x speed up over just using a single node.
For example if your cluster is using condor as the scheduler then you could submit 1 job per data item to the "vanilla" universe, varying only the "Arguments =" line of the job description. (There are other ways to do this for Condor which may be more sensible and there are also similar things for torque, sge etc.)
OpenMP 本质上是针对 SMP 机器的,因此如果您想扩展到数百个节点,无论如何您都需要 MPI。不过,您可以同时使用两者。 MPI 跨节点分配工作,OpenMP 跨核心或每个节点多个 CPU 处理并行性。我想说 OpenMP 比搞 pthread 容易多了。但由于其粒度较粗,从 OpenMP 获得的速度通常低于手动优化的 pthread 实现。
OpenMP is essentially for SMP machines, so if you want to scale to hundreds of nodes you will need MPI anyhow. You can however use both. MPI to distribute work across nodes and OpenMP to handle parallelism across cores or multiple CPUs per node. I would say OpenMP is a lot easier than messing with pthreads. But it being coarser grained, the speed up you will get from OpenMP will usually be lower than a hand optimized pthreads implementation.