MPI编程实现多工作人员的大数据收集

发布于 2024-10-09 20:15:28 字数 469 浏览 0 评论 0原文

现在,我有一个由单个master和多个worker组成的应用程序。应用需求很简单:worker完成一些工作,将数据发送给master,master将这些数据分别存储到文件中。我可以简单地在工作端使用 MPI_Send 将数据发送到主控。但master不知道数据发送顺序。有的工人走得快,有的工人走得慢。更具体地,假设有5个worker,则数据发送顺序可以是1、3、4、5、2或2、5、4、1、3。如果我只是在master端用MPI_Recv编写一个像for(i=1 to 5)这样的for循环来获取数据,那么master和一些更快的worker必须等待很长时间。我知道 MPI_Gather 可以实现这个。但我不确定 MPI_Gather 是并行工作还是只是 MPI_Recv 的一些顺序调用?另一个问题是我的数据非常大,需要发送超过1GB的数据给master。如果我将数据分成主干,可能会使其变得更加复杂。我认为 MPI_Gather 不起作用。我还尝试考虑原始套接字编程,但我认为这不是一个好的做法。请给我一些建议好吗?

Now, I have a application that composed of single master and many workers. The application requirement is very simple: workers finish some jobs and send data to master and master store these data into files separately. I can simply use MPI_Send on worker side to send data to master. But master does not know the data sending sequence. Some workers go fast while some are slow. More specifically, suppose there are 5 workers, then the data sending sequence may be 1,3,4,5,2 or 2,5,4,1,3. If I just write a for loop like for(i=1 to 5) on master side with MPI_Recv to get data, the master and some faster worker have to wait for a long time. I know MPI_Gather can implement this. But I am not sure is MPI_Gather works parallelly or just some sequential calls of MPI_Recv? Another issue is my data is extremely large, more than 1GB data needed to be sent to master. If I divide the data into trunks, it may make it more complex. I do not think MPI_Gather can work. I also tried to think about raw socket programming, but I do not think it is a good practice. Would you give me some suggestion please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

苏辞 2024-10-16 20:15:28

如果我正确理解你的问题,你想在主站接收数据,但由于每个任务需要不同的时间来完成,你不想按顺序循环所有处理器,以便接收处理5(如果已完成)不会等待来自进程 3(仍在运行)的接收。

如果想要无序接收,可以使用 mpi_recvMPI_ANY_SOURCE 常量作为发送消息的处理器的等级。然后,您应该能够检查返回的状态,以确定哪个处理器发送了消息以发送更多工作。不要循环遍历所有处理器,只需在工作循环中使用单个接收语句即可。

If I understand your question correctly, you want to receive the data back at the master, but since each task takes a different amount of time to finish, you don't want to loop over all the processors in order so that the receive for process 5 (if it's finished) isn't waiting for the receive from process 3 (which is still running).

If want to receive out-of-order, it's possible to use mpi_recv with the MPI_ANY_SOURCE constant as the rank of the processor sending the message. You should then be able to inspect the returned status to determine which processor sent the message to send more work. Rather than looping over all processors, just have a single receive statement in your work loop.

澉约 2024-10-16 20:15:28

工作人员可以写出文件而不是将数据发送回主机吗?当工作进程完成时,它可以向主进程发送“我完成了”消息。反过来,主机可以将下一个工作块发送给该工作人员。当没有剩余工作可供分发时,让主机向工作线程发送“不再工作”消息,然后工作线程可以调用 MPI Finalize。

could the workers write out the files instead of sending the data back to the master? when a worker finishes, it could send a "i'm done" message to the master. the master, in turn could send the next chunk of work to that worker. when there is no work left to hand out, have the master send a "no more work" message to the worker, who could then call MPI Finalize.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文