均匀工作分配算法
关于工作平衡的一个简单问题。
程序并行处理文件。假设文件的大小是处理该文件所需时间的近似度量。所有文件都是事先已知的。
我们有 N 个可以处理文件的节点。如何分发这些文件,以便每个节点的工作量最接近平均。
想法非常微不足道,我有几个想法,但这确实看起来像是一些经典问题,并且已经存在最佳解决方案。
我只是不知道它叫什么。
有人知道这个吗?
谢谢!
编辑: 好吧,抱歉,我遗漏了很多信息。我正在从事 MPI 实施工作。标准主从系统。一个主节点检查目标目录,拾取需要处理的文件,然后将文件分配给从属 MPI 任务,以便它们可以并行完成自己的部分。
从节点数量小于32个。
目标文件数量小于10000。
A quick question about work balancing.
Program processing files in parallel. Lets say size of the file is an approximated measure of how long it will take to process it. All files are known in advance.
We have N nodes that can work on files. How to distribute these files so that each node will have closest to avg amount of work.
Idea is pretty trivial and I have couple of ideas, but it really seems like some classical problem with best solution already in existence.
I just don't know what it called.
Someone knows this ?
Thanks!
EDIT:
Ok, sorry, I omitted a lot of information. I am working on MPI implementation. Standard master-slave system. One master node examines target directory, picking up files that needs to be processed, and then assign files to slaves MPI tasks so they can do their part in parallel.
Amount of slave nodes is less than 32.
Amount of target files less than 10000.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您问的是经典的多处理器调度问题。维基百科文章是算法基本概述的良好开端(http://en.wikipedia.org/ wiki/Multiprocessor_scheduling)。
You are asking about the classic mulitprocessor scheduling problem. The wikipedia article is a good start for a basic overview of an algorithm (http://en.wikipedia.org/wiki/Multiprocessor_scheduling).
这是一个想法。按降序对(文件名,大小)对进行排序。然后,从最大的文件开始,将每个文件分配给文件累积权重最低的节点(无论您喜欢什么,都可以打破平局)。 “一个给我,一个给你”的方法。
需要 O(MlogM) 对 M 个文件记录进行排序,并需要 O(M*N) 进行分发(有人仔细检查这一点),但我认为该算法给出了良好的-最优的? - 结果。
编辑:在检查了另一张海报提供的链接后,发现这是 LPT 方法,它位于 P 中,但在获得尽可能接近的平均大小方面并不是最佳的。
Here's a thought. Sort the (filename, size) pairs in descending order. Then, starting with the biggest file, assign each file to the node which has the lowest cumulative weight of files (break ties however you like). The "one for me, one for you" approach.
Takes O(MlogM) to sort M file records and O(M*N) to distribute (somebody double check this), but I think the algorithm gives good - optimal? - results.
Edit: after checking out the link provided by the other poster, it turns out this is the LPT approach, which is in P but which isn't optimal in terms of getting the average size as close as possible.
我一直在使用递归除法和并行化减少函数进行实验征服算法并确定提交给节点的作业数量满足不等式
l >= n / P^2
其中 l 是作业数量,n 是原始工作负载的大小,P “节点”、工作人员、处理器的数量,无论你如何称呼它们。
对于大多数计算机上的情况移动设备 n 将比 P 大多个数量级。您需要确保单个作业的数量不会太大,以至于您花费所有时间将它们划分并发送给工作人员。
I've been experimenting with parallelising reduction functions using recursive divide & conquer algorithms and have settled on having the number of jobs submitted to nodes satisfy the inequality
l >= n / P^2
where l is the number of jobs, n the size of the original workload and P the number of 'nodes' , workers, processors, whatever you want to call them.
For situations on most computers & mobile devices n will be many orders of magnitude greater than P. You want to make sure that the number of individual jobs isn't so large that you spend all your time dividing them up and sending them to workers.
如果所有工作单元长度提前已知(可估计),这基本上就成为装箱问题(https:/ /en.wikipedia.org/wiki/Bin_packing_problem)。这可以通过“首次拟合”和“最佳拟合”算法启发式地解决(请参阅链接)。
If all work units lengths are known (estimatable) in advance, this basically becomes the bin packing problem (https://en.wikipedia.org/wiki/Bin_packing_problem). This is solvable heuristically with "first fit" and "best fit" algorithms (see the link).