MPI如何发送和接收未知数据类型

发布于 2024-12-10 12:29:16 字数 496 浏览 5 评论 0原文

我们用 C++ 开发了一个算法库,它允许用户实现自己的数据类型,以便在各个算法之间共享数据(也由用户实现)。 这工作得很好,但我们希望在库级别提供并行化。各个算法应该在分布式内存机器的不同节点上并行执行。

我们决定使用 MPI 进行并行化,因为它可以用于分布式和共享内存机器,而无需更改代码。 不幸的是,我们现在面临的问题是如何在节点之间分配用户实现的数据类型。我们有以下问题:

  • 我们不知道数据有多大,甚至可能在每次运行时都发生变化。
  • 我们不知道数据结构内有什么数据。
  • 数据量可以非常大,达到 1GB(这对于 MPI 来说应该没有问题)
  • 用户在实现并行执行的数据类型或算法方面不应该看到任何差异(对于算法来说实际上没有问题)

是否有可能使用 MPI 在节点之间共享这些数据,或者是否有可用的方法,这可能更适合此类问题。 我们希望有一个至少可以在共享内存机器上工作的解决方案,但是我们希望有一个无需更改代码即可在共享和分布式内存机器上工作的解决方案。

We have developed an algorithm library in C++ which allows the user to implement his own datatypes for sharing data between individual algorithms (also implemented by the user).
This works fine, but we want to provide parallelization at library level. The individual algorithms should be executed in parallel on different nodes of distributed memory machines.

We decided to use MPI for parallelization, as it can be used for distributed and shared memory machines without code changes.
Unfortunately we fight now the problem how to distribute the user implemented datatypes between the nodes. We have the following problems:

  • We do not know how big the data might be, it might even change from run to run.
  • We do not know what data is inside the data structure.
  • The amount of data can be very big up to 1GB (this should be no problem with MPI)
  • The user should not see any difference in implementing the datatypes or algorithms for parallel execution (for the algorithm there is actually no problem)

Is there a possibility to use MPI to share these data between the nodes, or are there approaches available, which might be better suited for this kind of problem.
We would like to have a solution which works at least on shared memory machines however we would love to have a solution which works without code changes on shared and distributed memory machines.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

GRAY°灰色天空 2024-12-17 12:29:16

是的,您可以使用 MPI 来完成此操作,但是不行,MPI 本身无法为您完成此操作。

无论您是将此数据发送到另一个节点,还是将其写入磁盘,在某些时候您都需要明确描述内存中的数据结构布局,以便它可以序列化。如果您向 MPI(或任何其他通信库)传递一个指针,它不知道该指针的另一侧是什么,因此它无法遍历数据结构来复制其内容。

您可以将参数编组为普通旧数据(手动或使用 MPI_PACK 等),也可以创建一个 MPI 数据类型来描述该特定实例的内存中数据的布局,并将复制数据。此外,您还需要重定向数据结构中的任何指针。 Boost 序列化也许可以帮助您所有这一切。

Yes, you can do this with MPI, but no, MPI can't do it for you by itself.

Whether you're sending this data to another node, or writing it to disk, at some point you need to expressly describe the data structures layout in memory so that it can be serialized. If you pass MPI (or any other communications library) a pointer, it doesn't know what lies on the other side of that pointer, and so it has no way of traversing the data structure to copy its contents.

You can marshal the arguments into plain old data (manually, or with things like MPI_PACK), or you can create an MPI datatype which describes the layout of data in memory for that particular instance, and that will copy the data over. In addition, you'll need to redirecting any pointers within the data structure. Boost serialization may be able to help you with all of this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文