MPI分布层
我用MPI写了一个分布层。假设我们有 n 个数据源和 k 个数据使用者。在我的方法中,n 个 MPI 进程中的每一个都会读取数据,然后以给定方式(逻辑)将其分发给 k 个数据使用者(其他 MPI 进程)中的一个(或多个)。
所以它似乎非常通用,我的问题是已经完成了类似的事情吗?
看起来很简单,但可能会非常复杂。假设分配检查哪些数据消费者已准备好工作(动态工作分配)。它可以根据数据按照给定的算法分发数据。有很多可能性,而我和我们每个人都不想重新发明轮子。
I used MPI to write a distribution layer. Let say we have n of data sources and k of data consumers. In my approach each of n MPI processes reads data, then distributes it to one (or many) of k data consumers (other MPI processes) in given manner (logic).
So it seems to be very generic and my question is there something like that already done?
It seems simple, but it might be very complicated. Let say that distribution checks which of data consumers is ready to work (dynamic work distribution). It may distribute data according to given algorithm based on data. There are plenty of possibilities and I as every of us do not want to reinvent the wheel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
据我所知,除了 MPI API 本身之外,没有通用的实现。您应该根据问题的限制使用正确的函数。
如果您正在尝试构建一个简单的 n 生产者和 k 消费者同步作业/数据队列,那么当然已经有很多实现(只需 google 一下,您应该会得到一些)。
但是,您呈现的方式似乎非常笼统 - 有时您希望数据仅发送给一个消费者,有时发送给所有消费者,等等。在这种情况下,您应该弄清楚您想要什么以及何时,并使用任一点相应地,点对点通信功能或集体通信功能(当然每个人都必须知道会发生什么 - 你不能让消费者等待来自单个源的数据,而生产者希望广播数据。 .)。
除此之外,我想到的一个实现似乎可以满足您的所有要求:
创建一个同步队列,生产者在一端推送数据,消费者从另一端获取数据(根据需要决定队列的各种行为 - 队列大小是否有限,是否将元素添加到完整队列块或失败,是否从空队列块中删除元素或失败等)。
假设数据包含一些标志,告诉消费者该数据是适用于所有人还是仅适用于其中一个人,消费者会查看并删除该元素,或者将其留在那里并注意他们已经这样做了(通过保留其 id在本地,或通过更改数据本身的标志)。
如果您不希望单个集体数据在每个人都处理它之前被阻塞,您可以使用 2 个队列,每种类型的数据一个,消费者一次从其中一个队列获取数据(可以通过每次选择不同的队列,随机选择一个队列,对其中一个队列进行优先级排序,或者按照可以从数据中扣除的某种接受的顺序(例如,首先是最低的id)。
抱歉回答太长,希望对您有所帮助:)
As far as I know, there is no generic implementation for it, other than the MPI API itself. You should use the correct functions according to the problem's constraints.
If what you're trying to build a simple n-producers-and-k-consumers synchronized job/data queue, then of course there are already many implementations out there (just google it and you should get a few).
However, the way you present it seems very general - sometimes you want the data to only be sent to one consumer, sometimes to all of them, etc. In that case, you should figure out what you want and when, and use either point-to-point communication functions, or collective communication functions, accordingly (and of course everyone has to know what to expect - you can't have a consumer waiting for data from a single source, while the producer wishes to broadcast the data...).
All that aside, here is one implementation that comes to mind that seems to answer all of your requirements:
Make a synchronized queue, producers pushing data in one end, consumers taking it from the other (decide on all kinds of behaviors for the queue as you need - is the queue size limited, does adding an element to a full queue block or fail, does removing an element from an empty queue block or fail, etc.).
Assuming the data contains some flag that tells the consumers if this data is for everyone or just for one of them, the consumers peek and either remove the element, or leave it there and just note that they already did it (either by keeping its id locally, or by changing a flag in the data itself).
If you don't want a single piece of collective data to block until everyone dealt with it, you can use 2 queues, one for each type of data, and the consumers would take data from one of the queues at a time (either by choosing a different queue each time, randomly choosing a queue, prioritizing one of the queues, or by some accepted order that is deductible from the data (e.g. lowest id first)).
Sorry for the long answer, and I hope this helps :)