使用集体 MPI-IO 读写大文件的最佳方法

发布于 2025-01-06 13:07:12 字数 1171 浏览 3 评论 0原文

我想使用 MPI-IO 在 Fortran 中读取和写入大型数据集。我的首选方法是使用 MPI_type_create_subarray 定义的 MPI 类型和单个维度来描述文件中每个进程的视图。因此,我的 Fortran 代码如下所示:

  ! A contiguous type to describe the vector per element.
  ! MPI_TYPE_CONTIGUOUS(COUNT, OLDTYPE, NEWTYPE, IERROR)
  call MPI_Type_contiguous(nComponents, rk_mpi, &
    &                      me%vectype, iError)
  call MPI_Type_commit( me%vectype, iError )

  ! A subarray to describe the view of this process on the file.
  ! MPI_TYPE_CREATE_SUBARRAY(ndims, array_of_sizes, array_of_subsizes,
  !                          array_of_starts, order, oldtype, newtype, ierror)
  call MPI_Type_create_subarray( 1, [ globElems ], [ locElems ], &
    &                           [ elemOff ], MPI_ORDER_FORTRAN, &
    &                           me%vectype, me%ftype, iError)

然而,描述全局量的 array_of_sizes 和 array_of_starts 只是 MPI 接口中的“正常”整数。因此,这种方法存在大约 20 亿个元素的限制。 是否有另一个接口使用 MPI_OFFSET_KIND 作为这些全局值? 到目前为止,我认为解决此问题的唯一方法是使用 MPI_File_set_view 中的位移选项,而不是借助子数组 MPI 类型定义视图。然而,这“感觉”是错误的。您预计这两种方法会对集体 IO 的性能产生影响吗?有人知道这个接口在 MPI-3 中是否会改变吗? 也许我应该使用其他 MPI 类型?

这里推荐的解决方案是什么,可以通过集体 IO 有效地并行写入大型数据文件到磁盘?

I would like to read and write large data sets in Fortran using MPI-IO. My preferred approach would be to use a MPI type defined with MPI_type_create_subarray with a single dimension to describe the view of each process to the file. My Fortran code thus looks like this:

  ! A contiguous type to describe the vector per element.
  ! MPI_TYPE_CONTIGUOUS(COUNT, OLDTYPE, NEWTYPE, IERROR)
  call MPI_Type_contiguous(nComponents, rk_mpi, &
    &                      me%vectype, iError)
  call MPI_Type_commit( me%vectype, iError )

  ! A subarray to describe the view of this process on the file.
  ! MPI_TYPE_CREATE_SUBARRAY(ndims, array_of_sizes, array_of_subsizes,
  !                          array_of_starts, order, oldtype, newtype, ierror)
  call MPI_Type_create_subarray( 1, [ globElems ], [ locElems ], &
    &                           [ elemOff ], MPI_ORDER_FORTRAN, &
    &                           me%vectype, me%ftype, iError)

However, array_of_sizes and array_of_starts, describing global quantities are just "normal" integers in the MPI-Interface. Thus there is a limit at about 2 billion elements with this approach.
Is there another interface, which uses MPI_OFFSET_KIND for these global values?
The only way to work around this, I see so far, is using the displacement option in the MPI_File_set_view instead of defining the view with the help of the subarray MPI type. However this "feels" wrong. Would you expect a performance impact in either approach for collective IO? Does anybody know, if this interface will change in MPI-3?
Maybe I should use some other MPI type?

What is the recommended solution here to write large data files with collective IO efficiently in parallel to disk?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

西瓜 2025-01-13 13:07:12

帮助即将到来。

在 MPI-3 中,将有使用 MPI_Count 的数据类型操作例程 而不是 int。为了向后兼容(呻吟),现有的例程不会改变,但你应该能够创建你的类型。

但目前..
不过,特别是对于子数组,目前这通常不被认为是一个大问题 - 即使对于二维数组,20 亿的索引也会给你一个 4x1018 的数组大小,这无疑是相当大(但正是针对百亿亿次计算的数字)。在更高的维度中,它甚至更大。

不过,在 1d 中,20 亿长的数字列表只有 ~8GB,这不是任何可拉伸的大数据,我认为这就是你发现自己所处的情况。我的建议是保留它的形式现在只要你可以。当地的元素有没有共同点?如果可行的话,您可以通过以(例如)10 vectypes 为单位捆绑类型来解决此问题 - 对于您的代码来说这不重要,但它会以相同的因子减少 locElements 和 globElements 中的数字。否则,是的,您始终可以在文件集视图中使用位移字段。

Help is coming.

In MPI-3, there will be datatype manipulation routines that use MPI_Count instead of an int. For backwards compatability (groan) the existing routines won't change, but you should be able to make your type.

But for now..
For subarray in particular, though, this isn't usually thought of as a huge issue at the moment - even for a 2d array, indices of 2 billion give you an array size of 4x1018 which is admittedly pretty large (but exactly the sort of numbers targetted for exascale-type computing). In higher dimensions, it's even larger.

In 1d, though, a list of numbers 2 billion long is only ~8GB which isn't by any stretch big data, and I think that's the situation you find yourself in. My suggstion would be to leave it in the form you have it now for as long as you can. Is there a common factor in the local elements? You can work around this by bundling up the types in units of (say) 10 vectypes if that works - for your code it shouldn't matter, but it would reduce by that same factor the numbers in the locElements and globElements. Otherwise, yes, you could always use the displacement field in file set view.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文