找到最繁忙时段的算法？

发布于 2024-11-03 02:30:18 字数 1406 浏览 3 评论 0原文

我有一些这样的数据：

1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15

我将尝试用一种表示方式使其更清楚：

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15
1             |--------------------------------------X---------|
2                   |--------------------------------X--------------------------------------------|
3                         |--------------------------X---|
4                                                  |-X-------------------------------------|
5                                           |--------X------------------------------|
6                               |--------------------X----------|
7                                                                     |---------------------------|

因此，在示例情况中，如果使用第二种方案，则 8-9 是关键期，因为所有点都处于活动状态。在 python 中解决这个问题的快速且好的方法是什么？我正在考虑使用动态编程，但是还有其他建议的方法吗？

到目前为止我的方法是：

我更多地从实时角度思考。因此，每当我得到一个新点时，我都会这样做：假设我已经得到了 2-10 并且得到了 3-15 然后我选择开始的最大值和最小值结束，因此本例为 3-10 并将此间隔的计数增加到 2。然后第三个点出现在 4-9 中，选择最大值为 4 和最小值是 9 并将值 3-10 更新为 4-9 并将计数更新为 3。现在，当 8-14 出现时，我选择该区间的开始时间大于 4-9，该区间的结束时间小于 4-9。在本例中，情况并非如此，因此我将创建一个新存储桶 8-14 并将计数设置为 1。这不是整个算法，但应该给出我所要执行的操作的高级概念。我在这里做。我看看是否可以画出伪代码。

原文

I have some data like this:

1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15

I will attempt a representation to make it clearer:

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15
1             |--------------------------------------X---------|
2                   |--------------------------------X--------------------------------------------|
3                         |--------------------------X---|
4                                                  |-X-------------------------------------|
5                                           |--------X------------------------------|
6                               |--------------------X----------|
7                                                                     |---------------------------|

So in the example case, 8-9 is the critical period if the second scheme is used because all the points are active. What is a fast and good way to solving this problem in python? I am thinking of using dynamic programming but are there other approaches that are suggested?

My approach until now:

I was thinking more from a real-time perspective. So, whenever I get a new point, I do this: Assume I already got 2-10 and I get 3-15 then I pick the max of start and min of end so this case it is 3-10 and increment this interval's count to 2. Then the third point comes in 4-9, pick the max which is 4 and the min is 9 and update the value 3-10 to 4-9 and update count to 3. Now when 8-14 comes in, I pick the start of this interval is greater than 4-9 and the end of this interval is less than 4-9. In this case, it is not true so I will create a new bucket 8-14 and I put the count to 1. This is not the entire algorithm but should give a high-level idea of what I am doing here. I will see if I can sketch the pseudo-code.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

︶葆Ⅱㄣ 2024-11-10 02:30:18

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15
1             |--------------------------------------X---------|
2                   |--------------------------------X--------------------------------------------|
3                         |--------------------------X---|
4                                                  |-X-------------------------------------|
5                                           |--------X------------------------------|
6                               |--------------------X----------|
7                                                                     |---------------------------|

             +1    +1     +1   +1           +1     +1    -1    -2     +1           -1     -1     -2
              1     2     3     4           5       6    5      3     4             3      2      0
                                                     ^^^^

得到它？

因此，您需要将其转换

1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15

为：

[(2,+), (3,+), (4,+), (5,+), (7,+), (8,+), (9,-), (10,-), (10,-), (11,+), (13,-), (14,-), (15,-), (15,-)]

，然后您只需迭代即可，当您看到 + 时向上计数，看到 - 时向下计数。最繁忙的时间间隔是计数最大时。

所以在代码中：

intervals = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)]
intqueue = sorted([(x[0], +1) for x in intervals] + [(x[1], -1) for x in intervals])
rsum = [(0,0)]
for x in intqueue: 
    rsum.append((x[0], rsum[-1][1] + x[1]))
busiest_start = max(rsum, key=lambda x: x[1])
# busiest_end = the next element in rsum after busiest_start 

# instead of using lambda, alternatively you can do:
#     def second_element(x):
#         return x[1]
#     busiest_start = max(rsum, key=second_element)
# or:
#     import operator
#     busiest_start = max(rsum, key=operator.itemgetter(1))

运行时复杂度是 (n+n)*log(n+n)+n+n 或 O(n*log(n))

它也是如果您在程序开始时没有完整的间隔列表，可以将此想法转换为在线算法但可以保证传入间隔永远不会安排在过去的点上。您将使用优先级队列而不是排序，每次间隔到来时，您都会推入两个项目，即起点和终点，每个项目分别具有+1和-1。然后你就下车计算并记录高峰时间。

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15
1             |--------------------------------------X---------|
2                   |--------------------------------X--------------------------------------------|
3                         |--------------------------X---|
4                                                  |-X-------------------------------------|
5                                           |--------X------------------------------|
6                               |--------------------X----------|
7                                                                     |---------------------------|

             +1    +1     +1   +1           +1     +1    -1    -2     +1           -1     -1     -2
              1     2     3     4           5       6    5      3     4             3      2      0
                                                     ^^^^

Get it?

So you need to transform this:

1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15

into:

[(2,+), (3,+), (4,+), (5,+), (7,+), (8,+), (9,-), (10,-), (10,-), (11,+), (13,-), (14,-), (15,-), (15,-)]

and then you simply iterate through, counting up when you see a + and counting down on -. The busiest interval will be when the count is maximum.

So in code:

intervals = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)]
intqueue = sorted([(x[0], +1) for x in intervals] + [(x[1], -1) for x in intervals])
rsum = [(0,0)]
for x in intqueue: 
    rsum.append((x[0], rsum[-1][1] + x[1]))
busiest_start = max(rsum, key=lambda x: x[1])
# busiest_end = the next element in rsum after busiest_start 

# instead of using lambda, alternatively you can do:
#     def second_element(x):
#         return x[1]
#     busiest_start = max(rsum, key=second_element)
# or:
#     import operator
#     busiest_start = max(rsum, key=operator.itemgetter(1))

runtime complexity is (n+n)*log(n+n)+n+n or O(n*log(n))

It is also possible to convert this idea into an online algorithm if you don't have the complete list of intervals at the start of the program but is guaranteed that incoming intervals will never be scheduled for a past point. Instead of sorting you will use a priority queue, each time an interval comes, you push in two items, the start point and the end point, each with a +1 and -1 respectively. And then you pop off and count and keep track of the peak hour.

回复收藏 0 原文

情绪操控生活 2024-11-10 02:30:18

我首先将点 x 的繁忙度视为 x 左侧的激活数量减去 x 左侧的停用数量。我将按激活和停用发生的时间（以 O(nlog(n)) 时间）对激活和停用进行排序。然后，您可以遍历列表，跟踪活动数字 (y)，通过激活和停用来递增和递减该数字。最繁忙的时期是 y 达到最大值的点。我想不出比 O(nlog(n)) 更好的解决方案。蛮力是 O(n^2)。

回复收藏 0 原文

梦忆晨望 2024-11-10 02:30:18

我想你也许可以使用 set() 来实现这一点，如果你保证所有周期至少在一个点相交，它就会起作用。

但是，当句点不相交时，此方法就不起作用。
您也许可以添加额外的逻辑来涵盖这一点，因此我将发布我的想法：

>>> periods = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10),]
>>> intersected = None
>>> for first, second in periods:
...     if not intersected:
...         intersected = set(range(first, second + 1))
...     else:
...         intersected = intersected.intersection(set(range(first, second + 1)))
...
>>> intersected
set([8, 9])

注意：这不包括 11-15 期间。
您可能最好只创建 RK 提到的 bin 对

I thought you could perhaps use a set() for this, and it would work if your assured that all periods intersect at at least one point.

However, this does not work as soon as a period does not intersect.
You may be able to add additional logic to cover this, so I'll post what I was thinking:

>>> periods = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10),]
>>> intersected = None
>>> for first, second in periods:
...     if not intersected:
...         intersected = set(range(first, second + 1))
...     else:
...         intersected = intersected.intersection(set(range(first, second + 1)))
...
>>> intersected
set([8, 9])

Note: this does not include the 11-15 period.
Your probably best off just creating bin pairs as mentioned by R.K.

回复收藏 0 原文

陌若浮生 2024-11-10 02:30:18

这是我对基于 bin 的方法的想法，并适应动态处理添加，基本上我相信 RK 所说的。

from collections import defaultdict
from operator import itemgetter

class BusyHour(object):
    def __init__(self):
        self.pairs = defaultdict(int)
    def add_period(self, period):
        start, end = period
        for current_period in range(start, end):
            pair_key = (current_period, current_period + 1) 
            self.pairs[pair_key] += 1
    def get_max(self):
        # sort, defaults to smallest to largest
        # --> items() returns (key, value) pairs
        # --> itemgetter gets the given index of the first argument given to sorted
        return max(self.pairs.items(), key=itemgetter(1))


if __name__ == '__main__':
    periods = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)]
    bh = BusyHour()
    for period in periods:
        bh.add_period(period)
    print bh.get_max()

更新：仅在调用 get_max 时进行排序，并使用 defaultdict(int)。

Here's what I was thinking for the bin based approach, and adapted to handle adds dynamically, basically what R.K. was saying I believe.

from collections import defaultdict
from operator import itemgetter

class BusyHour(object):
    def __init__(self):
        self.pairs = defaultdict(int)
    def add_period(self, period):
        start, end = period
        for current_period in range(start, end):
            pair_key = (current_period, current_period + 1) 
            self.pairs[pair_key] += 1
    def get_max(self):
        # sort, defaults to smallest to largest
        # --> items() returns (key, value) pairs
        # --> itemgetter gets the given index of the first argument given to sorted
        return max(self.pairs.items(), key=itemgetter(1))


if __name__ == '__main__':
    periods = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)]
    bh = BusyHour()
    for period in periods:
        bh.add_period(period)
    print bh.get_max()

Updated: Only sort on call to get_max, and use defaultdict(int).

回复收藏 0 原文

梦幻的心爱 2024-11-10 02:30:18

不确定我是否理解你的问题。如果您想找到最常见的“间隔”，您可以按间隔将它们相加。这样，上面的示例就有 12 个存储桶。对于每次使用，您都会向该特定使用中使用的每个存储桶添加 1，最后找到所有存储桶中的最大值。在这里，8-9 间隔为 6。

回复收藏 0 原文

痴者 2024-11-10 02:30:18

如果您想在这里获得线性性能，我已经编写了一个小型 C++ 程序。
我知道它不是Python，但这里的想法很简单。

我们首先创建一个包含所有点的数组，如果间隔从该索引开始，则递增数组中的项目；如果间隔从该索引结束，则递减该项目。

构造好数组后，我们只需迭代并计算开区间的最大数量。

时间复杂度为 O(M + N)

空间复杂度为 O(N)

其中 M 是间隔数，N 是间隔对中的最大值。

#include <iostream>
#include <vector>

int maxLoad(const std::vector<std::pair<int, int>>& intervals) {
    std::vector<int> points;
    for(const auto& interval : intervals) {
        if(interval.second >= points.size()) points.resize(interval.second + 1);
        ++points[interval.first];
        --points[interval.second];
    }

    int ans = 0;
    int sum = 0;
    for(const auto point : points) {
        sum += point;
        ans = std::max(ans, sum);
    }
    return ans;
}

int main() {
    std::vector<std::pair<int, int>> intervals {
        {2, 10}, {3, 15}, {4, 9}, {8, 14}, {7, 13}, {5, 10}, {11, 15}
    };
    std::cout << maxLoad(intervals) << std::endl;
}

I've put together a small C++ program if you want to have a linear performance here.
I know it is not Python, but the idea is very simple here.

We first create an array with all points and increment the item in the array if the interval starts at that index and decrement it if it ends on that index.

Once the array is constructed, we just iterate over and calculate where we had the maximum number of open intervals.

Time complexity is O(M + N)

Space complexity is O(N)

Where M is the number of intervals and N is the max value from the interval pairs.

#include <iostream>
#include <vector>

int maxLoad(const std::vector<std::pair<int, int>>& intervals) {
    std::vector<int> points;
    for(const auto& interval : intervals) {
        if(interval.second >= points.size()) points.resize(interval.second + 1);
        ++points[interval.first];
        --points[interval.second];
    }

    int ans = 0;
    int sum = 0;
    for(const auto point : points) {
        sum += point;
        ans = std::max(ans, sum);
    }
    return ans;
}

int main() {
    std::vector<std::pair<int, int>> intervals {
        {2, 10}, {3, 15}, {4, 9}, {8, 14}, {7, 13}, {5, 10}, {11, 15}
    };
    std::cout << maxLoad(intervals) << std::endl;
}

回复收藏 0 原文

~没有更多了~