C++集群和 HPC 编程

发布于 2024-08-27 07:57:54 字数 386 浏览 8 评论 0原文

我需要用 C++ 编写一个科学应用程序,进行大量计算并使用大量内存。我有部分工作,但由于对资源的要求很高,我想开始转向 OpenMPI。

在此之前,我有一个简单的好奇心:如果我正确理解了 OpenMPI 的原理,那么开发人员的任务就是将作业拆分到不同的节点上,根据当时可用的节点调用 SEND 和 RECEIVE。

您是否知道是否确实存在某些库或操作系统或任何具有此功能的东西,可以让我的代码保持现在的样子?基本上是一种连接所有计算机并让其共享内存和 CPU 的东西?

由于有关该主题的材料数量巨大,我有点困惑。 我应该考虑云计算吗?或者分布式共享内存?

I need to write a scientific application in C++ doing a lot of computations and using a lot of memory. I have part of the job but due to high requirements in terms of resources I was thinking to start moving to OpenMPI.

Before doing that I have a simple curiosity: If I understood the principle of OpenMPI correctly it is the developer that has the task of splitting the jobs over different nodes calling SEND and RECEIVE based on node available at that time.

Do you know if it does exist some library or OS or whatever that has this capability letting my code reamain as it is now? Basically something that connects all computers and let share as one their memory and CPU?

I am a bit confused because of the huge volume of material available on the topic.
Should I look at cloud computing? or Distributed Shared Memory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

听风吹 2024-09-03 07:57:54

目前还没有 C++ 库或实用程序可以让您在计算机集群上自动并行化代码。尽管有很多方法可以通过其他方法来实现分布式计算,但您确实希望优化应用程序以使用消息传递或分布式共享内存。

您最好的选择是:

  1. 将您的实施转换为基于任务的解决方案。有很多方法可以做到这一点,但这肯定是手工完成的。
  2. 清楚地确定可以在哪里分解任务以及这些任务本质上如何相互通信。
  3. 使用基于 OpenMPI/Mpich 构建的更高级别的库——我想到的是 Boost.MPI。

实现并行分布式解决方案是一回事,使其高效工作则是另一回事。阅读不同的拓扑和不同的并行计算模式,可以使实施解决方案比从头开始更轻松一些。

Currently there is no C++ library or utility that will allow you to automatically parallelize your code across a cluster of machines. Granted that there are a lot of ways to achieve distributed computing with other approaches, you really want to be optimizing your application to use message passing or distributed shared memory.

Your best bets would be to:

  1. Convert your implementation into a task-based solution. There are a lot of ways to do this but this will most definitely done by hand.
  2. Clearly identify where you can break the tasks up and how these tasks essentially communicate with each other.
  3. Use a higher level library that builds on OpenMPI/Mpich -- Boost.MPI comes to mind.

Implementing a parallel distributed solution is one thing, making it work efficiently is another though. Read up on different topologies and different parallel computing patterns to make implementing solutions a little less painful than if you had to start from scratch.

姜生凉生 2024-09-03 07:57:54

好吧,您实际上并没有准确说明您的目标硬件是什么,如果它是共享内存机器,那么 OpenMP 是一个选择。大多数并行程序员认为使用 OpenMP 进行并行化比使用 MPI 的任何版本都更容易。我还建议将 OpenMP 改造为现有代码比 MPI 更容易。从性能最佳的意义上来说,最好的 MPI 程序是那些从头开始设计为与消息传递并行的程序。

此外,一旦并行化,最好的顺序算法可能并不总是最有效的算法。有时,简单但顺序次优的算法是更好的选择。

您可以访问共享内存计算机:

  • 所有多核 CPU 实际上都是共享内存计算机;
  • 在很多集群上,节点通常有两个或四个 CPU,如果每个节点有 4 个核心,那么集群上可能有一个 16 核共享内存机器;
  • 如果您能够访问 MPP 超级计算机,您可能会发现它的每个节点都是共享内存计算机。

如果您对消息传递感到困惑,那么我强烈建议您坚持使用 C++ 和 OpenMPI(或系统上已安装的任何 MPI),并且您绝对应该看看 BoostMPI。我强烈建议您这样做,因为一旦您走出高性能科学计算的主流,您可能会发现自己置身于一支由单一编程组成的大军中,其中包含一组专门适合研究的库和其他工具。 C++、OpenMPI 和 Boost 的使用非常广泛,您可以将它们视为“武器级”或任何您喜欢的类比。 SO 上的流量很少,例如 MPI 和 OpenMP,在你把赌注押在其他技术上之前,先检查一下其他技术的统计数据。

如果您没有 MPI 经验,那么您可能需要看一下 Karniadakis 和 Kirby 编写的《C++ 和 MPI 中的并行科学计算》一书。 Gropp 等人的《Using MPI》可以作为参考,但它不是消息传递编程的初学者教材。

Well, you haven't actually stated exactly what the hardware you are targetting is, if it's a shared-memory machine then OpenMP is an option. Most parallel programmers would regard parallelisation with OpenMP as an easier option than using MPI in any of its incarnations. I'd also suggest that it is easier to retrofit OpenMP to an existing code than MPI. The best, in the sense of best-performing, MPI programs are those designed from the ground up to be parallelised with message-passing.

In addition, the best sequential algorithm might not always be the most efficient algorithm, once it has been parallelised. Sometimes a simple, but sequentially-sub-optimal algorithm is a better choice.

You may have access to a shared-memory computer:

  • all multicore CPUs are effectively shared-memory computers;
  • on a lot of clusters the nodes are often two or four CPUs strong, if they each have 4 cores then you might have a 16-core shared-memory machine on your cluster;
  • if you have access to an MPP supercomputer you will probably find that each of its nodes is a shared-memory computer.

If you are stuck with message-passing then I'd strongly advise you to stick with C++ and OpenMPI (or whatever MPI is already installed on your system), and you should definitely look at BoostMPI too. I advise this strongly because, once you step outside the mainstream of high-performance scientific computing, you may find yourself in an army of one programming with an idiosyncratic collection of just-fit-for-research libraries and other tools. C++, OpenMPI and Boost are sufficiently well used that you can regard them as being of 'weapons-grade' or whatever your preferred analogy might be. There's little enough traffic on SO, for example, on MPI and OpenMP, check out the stats on the other technologies before you bet the farm on them.

If you have no experience with MPI then you might want to look at a book called Parallel Scientific Computing in C++ and MPI by Karniadakis and Kirby. Using MPI by Gropp et al is OK as a reference, but it's not a beginner's text on programming for message-passing.

所有深爱都是秘密 2024-09-03 07:57:54

如果消息传递让您感到沮丧,请尝试分布式对象。有很多可用的分布式对象框架。 CORBA、DCOM、ICE 等等...如果您选择分发对象,您的对象将通过您定义的接口(数据和方法)具有全局可见性。任何节点中的任何对象都可以访问这些分布式对象。

我一直在寻找允许分配内存的软件,但还没有找到。我想这是因为您拥有所有这些可用的分布式对象框架,并且人们不需要分配内存。

If message passing is holding you down, try distributed objects. There are a lot of distributed object frameworks available. CORBA, DCOM, ICE to name a few... If you choose to distribute your objects, your objects will have global visibility through the interfaces(both data and methods) you will define. Any object in any node can access these distributed objects.

I have been searching for software that allows distributing memory, but haven't come across any. I guess its because you have all these distributed object frameworks available, and people don't have any need for distributing memory as such.

吃不饱 2024-09-03 07:57:54

我在研究生院使用 Top-C 时获得了很好的体验。

从主页上可以看出:“TOP-C 的独特之处在于它是一个可以轻松并行化现有顺序应用程序的包。”

http://www.ccs.neu.edu/home/gene/topc。 html

编辑:我应该补充一点,如果程序使用“简单并行性”,则并行化程序会简单得多。例如,节点不需要需要共享内存。 Mapreduce 就是建立在这个概念之上的。如果您可以最大限度地减少节点使用的共享状态量,您将看到并行处理带来的更好的改进。

I had a good experience using Top-C in graduate school.

From the home page: "TOP-C especially distinguishes itself as a package to easily parallelize existing sequential applications."

http://www.ccs.neu.edu/home/gene/topc.html

Edit: I should add, it's much simpler to parallelize a program if it uses "trivial parallelism". e.g. Nodes don't need to share memory. Mapreduce is built on this concept. If you can minimize the amount of shared state your nodes use, you'll see orders of magnitude better improvements from parallel processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文