分配算法
我需要将 N 个实体(每个实体都有可能的父级和可能的子级)分配给 M 个计算节点,同时满足以下优化条件:
- 实体的子级希望分配给同一计算节点(以最大化兄弟节点之间的数据局部性)
- 的分布实体应该尽可能均匀(即单个节点不会负担过重)。
我正在寻找一些关于启发式方法的建议来解决这个问题。
我已阅读http://en.wikipedia.org/wiki/Assignment%5Fproblem。
谢谢。
I need to assign N entities (each with possible parents and possible children) to M computation nodes while satisfying the following optimization conditions:
- Children of an entity want to be assigned to the same computation node (to maximize data locality among siblings)
- The distribution of entities should be as even as possible (i.e. no overtaxing of a single node).
I'm looking for some suggestions on heuristic methods to solve this problem.
I've read http://en.wikipedia.org/wiki/Assignment%5Fproblem.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不确定 1 是否是硬性要求。 如果是这样,作为第一步,您应该将实体分组为连接的组件。 如果不是,您应该指定 1 和 2 之间的权衡是什么,例如作为成本函数。
如果将每个节点限制为 N/M,则将组件放置在计算节点上就是装箱问题实体。 一个很好的近似是以下算法:
I'm not sure whether 1 is a hard requirement. If so, as a first step, you should group your entities into connected components. If not, you should specify what the tradeoff between 1 and 2 is, e.g. as a cost function.
Placing the components on computation nodes is the binpacking problem, if you limit each node to N/M entities. A good approximation is the following algorithm:
好吧,显然您必须预测每个进程平均有多少个子进程(或简单的总负载)。 与您可以使用经典的分配算法相比,它们通常非常简单。
最重要的问题当然是决定要最小化什么。 通常我们希望最大程度地减少延迟(我们得到多少“落后于计划”),但并非总是如此......
编辑:如果您提前知道所有子级/父级,并且进程的子级必须位于同一台机器中,您可以考虑一个进程及其所有子进程一开始都是同一个进程。 然后你可以使用一个非常简单的算法来最小化你想要最小化的任何东西。
编辑#2:阅读作业调度以获得更好的想法
Well, obviously you have to predict how many children (or simply total load) each process will have on average. Than you can use classic assignment algorithms, they're usually quite simple.
The most important issue is of course to decide what you want to minimize. Usually we want to minimize lateness (how much "behind schedule" we get), but not always...
Edit: If you know all children/parents in advance, and children of a process must be in the same machine, you can consider a process and all its children to be the same process to begin with. Then you can use a very simple algorithm to minimize whatever you want to minimize.
Edit #2: Read about Job Scheduling to get a better idea