当前位置：文江博客话题详情

C++集群和 HPC 编程

发布于 2024-08-27 07:57:54 字数 386 浏览 8 评论 0原文

我需要用 C++ 编写一个科学应用程序，进行大量计算并使用大量内存。我有部分工作，但由于对资源的要求很高，我想开始转向 OpenMPI。

在此之前，我有一个简单的好奇心：如果我正确理解了 OpenMPI 的原理，那么开发人员的任务就是将作业拆分到不同的节点上，根据当时可用的节点调用 SEND 和 RECEIVE。

您是否知道是否确实存在某些库或操作系统或任何具有此功能的东西，可以让我的代码保持现在的样子？基本上是一种连接所有计算机并让其共享内存和 CPU 的东西？

由于有关该主题的材料数量巨大，我有点困惑。我应该考虑云计算吗？或者分布式共享内存？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

听风吹 2024-09-03 07:57:54

目前还没有 C++ 库或实用程序可以让您在计算机集群上自动并行化代码。尽管有很多方法可以通过其他方法来实现分布式计算，但您确实希望优化应用程序以使用消息传递或分布式共享内存。

您最好的选择是：

将您的实施转换为基于任务的解决方案。有很多方法可以做到这一点，但这肯定是手工完成的。
清楚地确定可以在哪里分解任务以及这些任务本质上如何相互通信。
使用基于 OpenMPI/Mpich 构建的更高级别的库——我想到的是 Boost.MPI。

实现并行分布式解决方案是一回事，使其高效工作则是另一回事。阅读不同的拓扑和不同的并行计算模式，可以使实施解决方案比从头开始更轻松一些。

回复收藏 0 原文

姜生凉生 2024-09-03 07:57:54

好吧，您实际上并没有准确说明您的目标硬件是什么，如果它是共享内存机器，那么 OpenMP 是一个选择。大多数并行程序员认为使用 OpenMP 进行并行化比使用 MPI 的任何版本都更容易。我还建议将 OpenMP 改造为现有代码比 MPI 更容易。从性能最佳的意义上来说，最好的 MPI 程序是那些从头开始设计为与消息传递并行的程序。

此外，一旦并行化，最好的顺序算法可能并不总是最有效的算法。有时，简单但顺序次优的算法是更好的选择。

您可以访问共享内存计算机：

所有多核 CPU 实际上都是共享内存计算机；
在很多集群上，节点通常有两个或四个 CPU，如果每个节点有 4 个核心，那么集群上可能有一个 16 核共享内存机器；
如果您能够访问 MPP 超级计算机，您可能会发现它的每个节点都是共享内存计算机。

如果您对消息传递感到困惑，那么我强烈建议您坚持使用 C++ 和 OpenMPI（或系统上已安装的任何 MPI），并且您绝对应该看看 BoostMPI。我强烈建议您这样做，因为一旦您走出高性能科学计算的主流，您可能会发现自己置身于一支由单一编程组成的大军中，其中包含一组专门适合研究的库和其他工具。 C++、OpenMPI 和 Boost 的使用非常广泛，您可以将它们视为“武器级”或任何您喜欢的类比。 SO 上的流量很少，例如 MPI 和 OpenMP，在你把赌注押在其他技术上之前，先检查一下其他技术的统计数据。

如果您没有 MPI 经验，那么您可能需要看一下 Karniadakis 和 Kirby 编写的《C++ 和 MPI 中的并行科学计算》一书。 Gropp 等人的《Using MPI》可以作为参考，但它不是消息传递编程的初学者教材。

Well, you haven't actually stated exactly what the hardware you are targetting is, if it's a shared-memory machine then OpenMP is an option. Most parallel programmers would regard parallelisation with OpenMP as an easier option than using MPI in any of its incarnations. I'd also suggest that it is easier to retrofit OpenMP to an existing code than MPI. The best, in the sense of best-performing, MPI programs are those designed from the ground up to be parallelised with message-passing.

In addition, the best sequential algorithm might not always be the most efficient algorithm, once it has been parallelised. Sometimes a simple, but sequentially-sub-optimal algorithm is a better choice.

You may have access to a shared-memory computer:

all multicore CPUs are effectively shared-memory computers;
on a lot of clusters the nodes are often two or four CPUs strong, if they each have 4 cores then you might have a 16-core shared-memory machine on your cluster;
if you have access to an MPP supercomputer you will probably find that each of its nodes is a shared-memory computer.

If you are stuck with message-passing then I'd strongly advise you to stick with C++ and OpenMPI (or whatever MPI is already installed on your system), and you should definitely look at BoostMPI too. I advise this strongly because, once you step outside the mainstream of high-performance scientific computing, you may find yourself in an army of one programming with an idiosyncratic collection of just-fit-for-research libraries and other tools. C++, OpenMPI and Boost are sufficiently well used that you can regard them as being of 'weapons-grade' or whatever your preferred analogy might be. There's little enough traffic on SO, for example, on MPI and OpenMP, check out the stats on the other technologies before you bet the farm on them.

If you have no experience with MPI then you might want to look at a book called Parallel Scientific Computing in C++ and MPI by Karniadakis and Kirby. Using MPI by Gropp et al is OK as a reference, but it's not a beginner's text on programming for message-passing.

回复收藏 0 原文