找到最繁忙时段的算法?
我有一些这样的数据:
1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15
我将尝试用一种表示方式使其更清楚:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 |--------------------------------------X---------|
2 |--------------------------------X--------------------------------------------|
3 |--------------------------X---|
4 |-X-------------------------------------|
5 |--------X------------------------------|
6 |--------------------X----------|
7 |---------------------------|
因此,在示例情况中,如果使用第二种方案,则 8-9
是关键期,因为所有点都处于活动状态。在 python 中解决这个问题的快速且好的方法是什么?我正在考虑使用动态编程,但是还有其他建议的方法吗?
到目前为止我的方法是:
我更多地从实时角度思考。因此,每当我得到一个新点时,我都会这样做:假设我已经得到了 2-10
并且得到了 3-15
然后我选择开始的最大值和最小值结束,因此本例为 3-10
并将此间隔的计数增加到 2。然后第三个点出现在 4-9
中,选择最大值为 4 和最小值是 9 并将值 3-10
更新为 4-9
并将计数更新为 3。现在,当 8-14
出现时,我选择该区间的开始时间大于 4-9
,该区间的结束时间小于 4-9
。在本例中,情况并非如此,因此我将创建一个新存储桶 8-14
并将计数设置为 1。这不是整个算法,但应该给出我所要执行的操作的高级概念。我在这里做。我看看是否可以画出伪代码。
I have some data like this:
1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15
I will attempt a representation to make it clearer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 |--------------------------------------X---------|
2 |--------------------------------X--------------------------------------------|
3 |--------------------------X---|
4 |-X-------------------------------------|
5 |--------X------------------------------|
6 |--------------------X----------|
7 |---------------------------|
So in the example case, 8-9
is the critical period if the second scheme is used because all the points are active. What is a fast and good way to solving this problem in python? I am thinking of using dynamic programming but are there other approaches that are suggested?
My approach until now:
I was thinking more from a real-time perspective. So, whenever I get a new point, I do this: Assume I already got 2-10
and I get 3-15
then I pick the max of start and min of end so this case it is 3-10
and increment this interval's count to 2. Then the third point comes in 4-9
, pick the max which is 4 and the min is 9 and update the value 3-10
to 4-9
and update count to 3. Now when 8-14
comes in, I pick the start of this interval is greater than 4-9
and the end of this interval is less than 4-9
. In this case, it is not true so I will create a new bucket 8-14
and I put the count to 1. This is not the entire algorithm but should give a high-level idea of what I am doing here. I will see if I can sketch the pseudo-code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
得到它?
因此,您需要将其转换
为:
,然后您只需迭代即可,当您看到 + 时向上计数,看到 - 时向下计数。最繁忙的时间间隔是计数最大时。
所以在代码中:
运行时复杂度是
(n+n)*log(n+n)+n+n
或O(n*log(n))
它也是如果您在程序开始时没有完整的间隔列表,可以将此想法转换为在线算法但可以保证传入间隔永远不会安排在过去的点上。您将使用优先级队列而不是排序,每次间隔到来时,您都会推入两个项目,即起点和终点,每个项目分别具有+1和-1。然后你就下车计算并记录高峰时间。
Get it?
So you need to transform this:
into:
and then you simply iterate through, counting up when you see a + and counting down on -. The busiest interval will be when the count is maximum.
So in code:
runtime complexity is
(n+n)*log(n+n)+n+n
orO(n*log(n))
It is also possible to convert this idea into an online algorithm if you don't have the complete list of intervals at the start of the program but is guaranteed that incoming intervals will never be scheduled for a past point. Instead of sorting you will use a priority queue, each time an interval comes, you push in two items, the start point and the end point, each with a +1 and -1 respectively. And then you pop off and count and keep track of the peak hour.
我首先将点 x 的繁忙度视为 x 左侧的激活数量减去 x 左侧的停用数量。我将按激活和停用发生的时间(以 O(nlog(n)) 时间)对激活和停用进行排序。然后,您可以遍历列表,跟踪活动数字 (y),通过激活和停用来递增和递减该数字。最繁忙的时期是 y 达到最大值的点。我想不出比 O(nlog(n)) 更好的解决方案。蛮力是 O(n^2)。
I would start by thinking of the busy-ness of a point x as the number of activations to the left of x, minus the number of deactivations to the left of x. I would sort the activations and deactivations by the time at which they occur (in O(nlog(n)) time). Then you can traverse the list, tracking the number active (y), incrementing and decrementing that number with activations and deactivations passed. The busiest period will be the points at which y is at its maximum. I can't think of a solution off the top of my head that is better than O(nlog(n)). The brute force would be O(n^2).
我想你也许可以使用 set() 来实现这一点,如果你保证所有周期至少在一个点相交,它就会起作用。
但是,当句点不相交时,此方法就不起作用。
您也许可以添加额外的逻辑来涵盖这一点,因此我将发布我的想法:
注意:这不包括 11-15 期间。
您可能最好只创建 RK 提到的 bin 对
I thought you could perhaps use a set() for this, and it would work if your assured that all periods intersect at at least one point.
However, this does not work as soon as a period does not intersect.
You may be able to add additional logic to cover this, so I'll post what I was thinking:
Note: this does not include the 11-15 period.
Your probably best off just creating bin pairs as mentioned by R.K.
这是我对基于 bin 的方法的想法,并适应动态处理添加,基本上我相信 RK 所说的。
更新:仅在调用 get_max 时进行排序,并使用 defaultdict(int)。
Here's what I was thinking for the bin based approach, and adapted to handle adds dynamically, basically what R.K. was saying I believe.
Updated: Only sort on call to get_max, and use defaultdict(int).
不确定我是否理解你的问题。如果您想找到最常见的“间隔”,您可以按间隔将它们相加。这样,上面的示例就有 12 个存储桶。对于每次使用,您都会向该特定使用中使用的每个存储桶添加 1,最后找到所有存储桶中的最大值。在这里,8-9 间隔为 6。
Not sure if i understand your question. If you are trying to find the most common "interval", you can sum them up per interval. This way you have 12 buckets for the above example. For each use, you would add 1 to each of the buckets used in that particular use, and at the end, find the maximum value in all the buckets. Here, that would be 6 for the 8-9 interval.
如果您想在这里获得线性性能,我已经编写了一个小型 C++ 程序。
我知道它不是Python,但这里的想法很简单。
我们首先创建一个包含所有点的数组,如果间隔从该索引开始,则递增数组中的项目;如果间隔从该索引结束,则递减该项目。
构造好数组后,我们只需迭代并计算开区间的最大数量。
时间复杂度为 O(M + N)
空间复杂度为 O(N)
其中 M 是间隔数,N 是间隔对中的最大值。
I've put together a small C++ program if you want to have a linear performance here.
I know it is not Python, but the idea is very simple here.
We first create an array with all points and increment the item in the array if the interval starts at that index and decrement it if it ends on that index.
Once the array is constructed, we just iterate over and calculate where we had the maximum number of open intervals.
Time complexity is O(M + N)
Space complexity is O(N)
Where M is the number of intervals and N is the max value from the interval pairs.