如何使用 LINQ to Objects 安排作业而不重叠?
这是另一个资源分配问题。我的目标是运行一个查询,将任何时隙的最高优先级作业分配给两个 CPU 核心之一(只是一个示例,所以我们假设没有中断或多任务处理)。注意:这类似于我之前关于分区的文章,但重点关注重叠时间和分配多个项目,而不仅仅是最优先的项目。
这是我们的目标:
public class Job
{
public int Id;
public int Priority;
public DateTime Begin;
public DateTime End;
}
真实的数据集非常大,但对于这个例子,假设有 1000 个作业分配给两个 CPU 核心。它们都加载到内存中,我需要对它们运行单个 LINQ to Objects 查询。目前,这需要近 8 秒和 140 万次比较。
我利用了这篇文章中引用的逻辑确定两个项目是否重叠,但与该帖子不同的是,我不需要简单地找到重叠的项目,而是需要安排任何重叠集的顶部项目,然后安排下一个项目。
在讨论代码之前,让我指出当前无效算法的步骤:
- 确定剩余的作业(通过删除任何已分配的作业)
- 通过自连接所有剩余作业并按优先级选择顶部重叠作业,将作业分配给当前核心。
- 通过执行查询连接新作业
- 对每个剩余核心从停止 1 开始重复
问题:
- 如何才能最有效地做到这一点?
- 我可以避免步骤2中的子查询吗?也许通过分组,但我不确定如何根据 .Overlaps() 扩展方法进行分组。
- 这些工作已经被订购。第 2 步是否可以利用这一事实,只与短范围内的项目进行比较,而不是与每个作业进行比较?
- 有没有比循环更有效的分配给核心的方法?这样可以避免执行第3步中的查询吗?同样,如果我可以对重叠作业集进行分组,那么分配核心只需为每个重叠作业集选择 N 个即可。
完整示例代码:
public class Job
{
public static long Iterations;
public int Id;
public int Priority;
public DateTime Begin;
public DateTime End;
public bool Overlaps(Job other)
{
Iterations++;
return this.End > other.Begin && this.Begin < other.End;
}
}
public class Assignment
{
public Job Job;
public int Core;
}
class Program
{
static void Main(string[] args)
{
const int Jobs = 1000;
const int Cores = 2;
const int ConcurrentJobs = Cores + 1;
const int Priorities = Cores + 3;
DateTime startTime = new DateTime(2011, 3, 1, 0, 0, 0, 0);
Console.WriteLine(string.Format("{0} Jobs x {1} Cores", Jobs, Cores));
var timer = Stopwatch.StartNew();
Console.WriteLine("Populating data");
var jobs = new List<Job>();
for (int jobId = 0; jobId < Jobs; jobId++)
{
var jobStart = startTime.AddHours(jobId / ConcurrentJobs).AddMinutes(jobId % ConcurrentJobs);
jobs.Add(new Job() { Id = jobId, Priority = jobId % Priorities, Begin = jobStart, End = jobStart.AddHours(0.5) });
}
Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
timer.Restart();
Console.WriteLine("Assigning Jobs to Cores");
IEnumerable<Assignment> assignments = null;
for (int core = 0; core < Cores; core++)
{
// avoid modified closures by creating local variables
int localCore = core;
var localAssignments = assignments;
// Step 1: Determine the remaining jobs
var remainingJobs = localAssignments == null ?
jobs :
from j in jobs where !(from a in localAssignments select a.Job).Contains(j) select j;
// Step 2: Assign the top priority job in any time-slot to the core
var assignmentsForCore = from s1 in remainingJobs
where
(from s2 in remainingJobs
where s1.Overlaps(s2)
orderby s2.Priority
select s2).First().Equals(s1)
select new Assignment { Job = s1, Core = localCore };
// Step 3: Accumulate the results (unfortunately requires a .ToList() to avoid massive over-joins)
assignments = assignments == null ? assignmentsForCore.ToList() : assignments.Concat(assignmentsForCore.ToList());
}
// This is where I'd like to Execute the query one single time across all cores, but have to do intermediate steps to avoid massive-over-joins
assignments = assignments.ToList();
Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
Console.WriteLine("\nJobs:");
foreach (var job in jobs.Take(20))
{
Console.WriteLine(string.Format("{0}-{1} Id {2} P{3}", job.Begin, job.End, job.Id, job.Priority));
}
Console.WriteLine("\nAssignments:");
foreach (var assignment in assignments.OrderBy(a => a.Job.Begin).Take(10))
{
Console.WriteLine(string.Format("{0}-{1} Id {2} P{3} C{4}", assignment.Job.Begin, assignment.Job.End, assignment.Job.Id, assignment.Job.Priority, assignment.Core));
}
Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));
Console.WriteLine("Any key to continue");
Console.ReadKey();
}
}
示例输出:
1000 个作业 x 2 个核心
填充数据
0.00ms 内完成
将作业分配给核心
在 7,998.00 毫秒内完成
职位:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1
3/1/2011 12:02:00 AM-3/1/2011 12:32:00 AM Id 2 P2
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3
3/1/2011 1:01:00 AM-3/1/2011 1:31:00 AM Id 4 P4
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2
3/1/2011 2:02:00 AM-3/1/2011 2:32:00 AM Id 8 P3
3/1/2011 3:00:00 AM-3/1/2011 3:30:00 AM Id 9 P4
2011 年 3 月 1 日 3:01:00 AM-2011 年 3 月 1 日 3:31:00 AM Id 10 P0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3
3/1/2011 4:02:00 AM-3/1/2011 4:32:00 AM Id 14 P4
2011/3/1 5:00:00 AM-2011/3/1 5:30:00 AM Id 15 P0
3/1/2011 5:01:00 AM-3/1/2011 5:31:00 AM Id 16 P1
3/1/2011 5:02:00 AM-3/1/2011 5:32:00 AM Id 17 P2
3/1/2011 6:00:00 AM-3/1/2011 6:30:00 AM Id 18 P3
3/1/2011 6:01:00 AM-3/1/2011 6:31:00 AM Id 19 P4
作业:
2011 年 3 月 1 日 12:00:00 AM-2011 年 3 月 1 日 12:30:00 AM Id 0 P0 C0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1 C1
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3 C1
2011 年 3 月 1 日 1:02:00 AM-2011 年 3 月 1 日 1:32:00 AM Id 5 P0 C0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1 C0
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2 C1
2011 年 3 月 1 日 3:01:00 AM-2011 年 3 月 1 日 3:31:00 AM Id 10 P0 C0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1 C1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2 C0
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3 C1
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0 C0
总比较数:1,443,556.00
按任意键继续
This is another resource-allocation problem. My goal is to run a query to assign the top-priority job for any time-slot to one of two CPU cores (just an example, so let's assume no interrupts or multi-tasking). Note: this is similar to my earlier post about partitioning, but focuses on overlapping times and assigning multiple items, not just the top-priority item.
Here is our object:
public class Job
{
public int Id;
public int Priority;
public DateTime Begin;
public DateTime End;
}
The real dataset is very large, but for this example, let’s say there are 1000 jobs to be assigned to two CPU cores. They are all loaded into memory, and I need to run a single LINQ to Objects query against them. This is currently taking almost 8 seconds and 1.4 million comparisons.
I have leveraged the logic cited in this post to determine whether two items are overlapping, but unlike that post, I don't simply need to find overlapping items, but to schedule the top item of any overlapping set, and then schedule the next one.
Before I get to the code, let me point out the steps of the current inneficient algorithm:
- Determine the remaining jobs (by removing any already assigned)
- Assign jobs to the current core by self-joining all remaining jobs and selecting the top overlapping job by priority.
- Concatenate the new jobs by executing the query
- Repeat starting at Stop 1 for each remaining core
Questions:
- How can this be done most efficiently?
- Can I avoid the sub-query in step 2? Perhaps by grouping, but I'm not sure how I can group based on the .Overlaps() extension method.
- The jobs are already ordered. Can step 2 leverage that fact that and only compare against items within a short range instead of every single job?
- Is there a more efficient to assign to cores rather than looping? This could avoid executing the query in Step 3? Again, if I could group sets of overlaped jobs, then assigning cores is simply a matter of selecting N per overlaped set.
Full Sample Code:
public class Job
{
public static long Iterations;
public int Id;
public int Priority;
public DateTime Begin;
public DateTime End;
public bool Overlaps(Job other)
{
Iterations++;
return this.End > other.Begin && this.Begin < other.End;
}
}
public class Assignment
{
public Job Job;
public int Core;
}
class Program
{
static void Main(string[] args)
{
const int Jobs = 1000;
const int Cores = 2;
const int ConcurrentJobs = Cores + 1;
const int Priorities = Cores + 3;
DateTime startTime = new DateTime(2011, 3, 1, 0, 0, 0, 0);
Console.WriteLine(string.Format("{0} Jobs x {1} Cores", Jobs, Cores));
var timer = Stopwatch.StartNew();
Console.WriteLine("Populating data");
var jobs = new List<Job>();
for (int jobId = 0; jobId < Jobs; jobId++)
{
var jobStart = startTime.AddHours(jobId / ConcurrentJobs).AddMinutes(jobId % ConcurrentJobs);
jobs.Add(new Job() { Id = jobId, Priority = jobId % Priorities, Begin = jobStart, End = jobStart.AddHours(0.5) });
}
Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
timer.Restart();
Console.WriteLine("Assigning Jobs to Cores");
IEnumerable<Assignment> assignments = null;
for (int core = 0; core < Cores; core++)
{
// avoid modified closures by creating local variables
int localCore = core;
var localAssignments = assignments;
// Step 1: Determine the remaining jobs
var remainingJobs = localAssignments == null ?
jobs :
from j in jobs where !(from a in localAssignments select a.Job).Contains(j) select j;
// Step 2: Assign the top priority job in any time-slot to the core
var assignmentsForCore = from s1 in remainingJobs
where
(from s2 in remainingJobs
where s1.Overlaps(s2)
orderby s2.Priority
select s2).First().Equals(s1)
select new Assignment { Job = s1, Core = localCore };
// Step 3: Accumulate the results (unfortunately requires a .ToList() to avoid massive over-joins)
assignments = assignments == null ? assignmentsForCore.ToList() : assignments.Concat(assignmentsForCore.ToList());
}
// This is where I'd like to Execute the query one single time across all cores, but have to do intermediate steps to avoid massive-over-joins
assignments = assignments.ToList();
Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
Console.WriteLine("\nJobs:");
foreach (var job in jobs.Take(20))
{
Console.WriteLine(string.Format("{0}-{1} Id {2} P{3}", job.Begin, job.End, job.Id, job.Priority));
}
Console.WriteLine("\nAssignments:");
foreach (var assignment in assignments.OrderBy(a => a.Job.Begin).Take(10))
{
Console.WriteLine(string.Format("{0}-{1} Id {2} P{3} C{4}", assignment.Job.Begin, assignment.Job.End, assignment.Job.Id, assignment.Job.Priority, assignment.Core));
}
Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));
Console.WriteLine("Any key to continue");
Console.ReadKey();
}
}
Sample Output:
1000 Jobs x 2 Cores
Populating data
Completed in 0.00ms
Assigning Jobs to Cores
Completed in 7,998.00ms
Jobs:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1
3/1/2011 12:02:00 AM-3/1/2011 12:32:00 AM Id 2 P2
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3
3/1/2011 1:01:00 AM-3/1/2011 1:31:00 AM Id 4 P4
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2
3/1/2011 2:02:00 AM-3/1/2011 2:32:00 AM Id 8 P3
3/1/2011 3:00:00 AM-3/1/2011 3:30:00 AM Id 9 P4
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3
3/1/2011 4:02:00 AM-3/1/2011 4:32:00 AM Id 14 P4
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0
3/1/2011 5:01:00 AM-3/1/2011 5:31:00 AM Id 16 P1
3/1/2011 5:02:00 AM-3/1/2011 5:32:00 AM Id 17 P2
3/1/2011 6:00:00 AM-3/1/2011 6:30:00 AM Id 18 P3
3/1/2011 6:01:00 AM-3/1/2011 6:31:00 AM Id 19 P4
Assignments:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0 C0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1 C1
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3 C1
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0 C0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1 C0
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2 C1
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0 C0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1 C1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2 C0
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3 C1
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0 C0
Total Comparisons: 1,443,556.00
Any key to continue
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是否有理由使用 linq to 对象集合来完成此任务?我认为我会创建一个活动列表,将所有作业放入队列中,并在活动列表低于 10 时将下一个作业从队列中弹出,并将其放入活动列表中。跟踪哪个核心正在执行哪个任务并将队列中的下一个任务分配给最不繁忙的核心很容易。将完成的事件连接到作业或仅监视活动列表,您就会知道何时适合将另一个作业从队列中弹出并放入活动列表中。
Is there a reason for using linq to object collections for this task? I think that I would create an active list, put all of the jobs in a queue and pop the next one out of the queue whenever the active list dipped below 10 and stick it into the active list. It's easy enough to track which core is executing which task and assign the next task in the queue the the least busy core. Wire up a finished event to the job or just monitor the active list and you'll know when it's appropriate to pop another job off the queue and into the active list.
我宁愿在一个循环中完成它。我的结果与你的不同。你安排了所有工作的 2/3。我的都安排好了。稍后我会添加解释。现在就出发去赴约了。
由于重大错误而被删除。重做一下..:P
I would rather do it in a single loop. My produces a different result from yours. Yours scheduled 2/3 of all the jobs. Mine scheduled all. I will add explanations later. Going off for an appointment now.
Removed due to major bug. Reworking on it.. :P