加权负载均衡资源调度算法
我正在开发的软件应用程序需要能够根据一组用户当前拥有的任务数量将任务分配给他们,其中任务最少的用户最有可能获得下一个任务。然而,当前的任务负载应该被视为一个权重,而不是一个绝对的顺序定义。 IOW,我需要实现一个加权负载平衡算法。
假设有 5 个用户,任务数量如下:
A:4 乙:5 中:0 深度:7 E:9
我想按照 CABDE 的顺序对下一个任务的用户进行优先级排序,其中 C 最有可能获得任务,E 最不可能获得任务。这里有两件重要的事情需要注意:
- 用户数量可以从 2 个到数十个不等。
- 分配给每个用户的任务数量可以从 1 到数百个不等。
目前,我们可以平等地对待所有任务,尽管我不介意将任务难度作为我将来可以使用的变量 - 但这纯粹是锦上添花。
到目前为止我提出的想法在某些情况下并不是很好。如果有大量用户,它们可能会将用户的权重过于紧密地结合在一起,或者如果用户没有当前任务,它们可能会失败,或者......
我尝试在网络上浏览,但运气不佳。谁能给我一个运行良好的算法的快速摘要?我不需要实际的实现——我会完成那部分——只是一个好的描述。或者,有没有一个可以免费访问的好网站?
另外,虽然我当然很欣赏质量,但这并不需要在统计上是完美的。因此,如果您能想到一个好的但不是很好的技术,我很感兴趣!
A software application that I'm working on needs to be able to assign tasks to a group of users based on how many tasks they presently have, where the users with the fewest tasks are the most likely to get the next task. However, the current task load should be treated as a weighting, rather than an absolute order definition. IOW, I need to implement a weighted, load-balancing algorithm.
Let's say there are five users, with the following number of tasks:
A: 4
B: 5
C: 0
D: 7
E: 9
I want to prioritize the users for the next task in the order CABDE, where C is most likely to get the assignment and E, the least likely. There are two important things to note here:
- The number of users can vary from 2 to dozens.
- The number of tasks assigned to each user can vary from 1 to hundreds.
For now, we can treat all tasks as equal, though I wouldn't mind including task difficult as a variable that I can use in the future - but this is purely icing on the cake.
The ideas I've come up with so far aren't very good in some situations. They might weight users too closely together if there are a large number of users, or they might fall flat if a user has no current tasks, or....
I've tried poking around the web, but haven't had much luck. Can anyone give me a quick summary of an algorithm that would work well? I don't need an actual implementation--I'll do that part--just a good description. Alternative, is there a good web site that's freely accessible?
Also, while I certainly appreciate quality, this need not be statistically perfect. So if you can think of a good but not great technique, I'm interested!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如您所指出的,这是一个负载平衡问题。这实际上并不是一个调度问题,因为您并没有试图最小化任何东西(总时间、并发工作人员数量等)。没有特殊的限制(工作持续时间、时间冲突、匹配的技能集等),所以实际上你的问题归结为选择合适的加权函数。
您说有些情况您想避免,例如用户权重太接近。你能提供更多细节吗?例如,让分配的机会与当前工作量成正比,并通过其他工作人员的工作量标准化,有什么问题吗?您可以将其想象为一系列不同长度的块(任务),被打包到一组垃圾箱(工作人员)中,您试图在其中尽可能保持垃圾箱的总高度。
有了更多信息,我们可以针对适合您的功能提出具体建议。
编辑:示例负载平衡函数
根据您的评论,这里有一些简单函数的示例,可以为您提供不同的平衡行为。一个基本问题是您想要确定性行为还是概率性行为。我将分别举几个例子。
使用问题中的示例 - 当前分配了 4 + 5 + 0 + 7 + 9 = 25 个工作。您想要选择谁获得工作 26。
1) 简单任务场。对于每个工作,始终选择当前待处理工作最少的工人。速度快的员工有更多工作要做,但每个人都几乎在同一时间完成。
2) 保证公平的工作量。如果工作人员的工作速度不同,并且您不希望某些工作人员比其他工作人员做得更多,则跟踪每个工作人员已完成+待处理工作的数量。分配下一个工作以保持这个数字均匀分布(速度快的工人可以得到免费休息)。
3) 基本线性归一化。选择每个工人可以拥有的最大工作数量。每个工人的工作量都标准化为该数字。例如,如果每个工人的最大作业数为 15,则在达到容量之前可以再添加 50 个作业。因此,对于每个工作人员,分配下一个工作的概率为。
如果您不想使用特定的最大阈值,则可以使用当前待处理工作数量最高的工作人员作为限制。在本例中,这是工人 E,因此概率为 请
注意,在这种情况下,标准化可确保工人 E 无法分配任何工作 - 他已经达到了极限。另外,仅仅因为 C 无事可做并不意味着他一定会得到一份新工作(只是更有可能)。
您可以通过生成 0 到 1 之间的随机数 r 并将其与这些边界进行比较来轻松实现选择函数。因此,如果 r 是 r 0.25,A获得工作,0.25< r < 0.45,B 得到工作,等等。
4) 非线性归一化。使用对数函数(而不是线性减法)对数字进行加权是获得非线性归一化的简单方法。您可以使用它来扭曲概率,例如,使没有很多工作的工人更有可能获得更多工作。
关键是,执行此操作的方法实际上是无限的。您使用什么加权函数取决于您尝试启用的特定行为。希望这能给您一些可以作为起点的想法。
As you point out, this is a load-balancing problem. It's not really a scheduling problem, since you're not trying to minimise anything (total time, number of concurrent workers, etc.). There are no special constraints (job duration, time clashes, skill sets to match etc.) So really your problem boils down to selecting an appropriate weighting function.
You say there are some situations you want to avoid, like user weightings that are too close together. Can you provide more details? For example, what's wrong with making the chance of assignment just proportional to the current workload, normalised by the workload of the other workers? You can visualise this as a sequence of blocks of different lengths (the tasks), being packed into a set of bins (the workers), where you're trying to keep the total height of the bins as even as possible.
With more information, we could make specific recommendations of functions that could work for you.
Edit: example load-balancing functions
Based on your comments, here are some example of simple functions that can give you different balancing behaviour. A basic question is whether you want deterministic or probabilistic behaviour. I'll give a couple of examples of each.
To use the example in the question - there are 4 + 5 + 0 + 7 + 9 = 25 jobs currently assigned. You want to pick who gets job 26.
1) Simple task farm. For each job, always pick the worker with the least jobs currently pending. Fast workers get more to do, but everyone finishes at about the same time.
2) Guarantee fair workload. If workers work at different speeds, and you don't want some doing more than others, then track the number of completed + pending jobs for each worker. Assign the next job to keep this number evenly spread (fast workers get free breaks).
3) Basic linear normalisation. Pick a maximum number of jobs each worker can have. Each worker's workload is normalised to that number. For example, if the maximum number of jobs/worker is 15, then 50 more jobs can be added before you reach capacity. So for each worker the probability of being assigned the next job is
If you don't want to use a specific maximum threshold, you could use the worker with the highest current number of pending jobs as the limit. In this case, that's worker E, so the probabilities would be
Note that in this case, the normalisation ensures worker E can't be assigned any jobs - he's already at the limit. Also, just because C doesn't have anything to do doesn't mean he is guaranteed to be given a new job (it's just more likely).
You can easily implement the choice function by generating a random number r between 0 and 1 and comparing it to these boundaries. So if r is < 0.25, A gets the job, 0.25< r < 0.45, B gets the job, etc.
4) Non-linear normalisation. Using a log function (instead of the linear subtraction) to weight your numbers is an easy way to get a non-linear normalisation. You can use this to skew the probabilities, e.g. to make it much more likely that workers without many jobs are given more.
The point is, the number of ways of doing this are practically unlimited. What weighting function you use depends on the specific behaviour you're trying to enable. Hopefully that's given you some ideas which you can use as a starting point.