MPI 大数据全部传输
我的 MPI 应用程序有一些生成一些大数据的过程。假设我们有N+1个进程(一个用于主控,其他是工作进程),每个工作进程都会生成大量数据,现在只需将其写入普通文件,名为file1,file2,...,fileN。每个文件的大小可能有很大不同。现在我需要将所有 fileM 发送到排名 M 的进程来完成下一个工作,所以这就像所有到所有的数据传输一样。
我的问题是我应该如何使用 MPI API 有效地发送这些文件?我以前曾经使用Windows共享文件夹来传输这些,但我认为这不是一个好主意。
我考虑过 MPI_file 和 MPI_All_to_all,但这些函数似乎不太适合我的情况。简单的MPI_Send和MPI_Recv似乎很难使用,因为每个进程都需要传输大量数据,而且我暂时不想使用分布式文件系统。
My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer.
My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea.
I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果没有更多数据,只有您现在拥有的数据,就不可能准确回答您的问题。因此,这里有一些一般性,您必须考虑它们,看看是否以及如何将它们应用到您的情况中。
It's not possible to answer your question precisely without a lot more data, data that only you have right now. So here are some generalities, you'll have to think about them and see if and how to apply them in your situation.