查找重叠间隔序列中最大和的算法

发布于 2024-09-09 01:08:24 字数 563 浏览 2 评论 0原文

我试图解决的问题在数轴上有一个间隔列表,每个间隔都有一个预定义的分数。我需要返回最大可能的总分。

问题是间隔重叠,并且重叠间隔中我只能使用一个。这是一个例子。

Intervals   - Score  
   0- 5     -  15  
   4- 9     -  18  
  10-15     -  12  
   8-21     -  19  
  25-30     -  25    

此处,间隔 0-5、4-9 和 8-21 重叠。
区间 10-15 和 8-21 也重叠。
最大总和为 55 (18+12+25)。

这里需要注意的是,我们选择第一批重叠区间中的区间 4-9,尽管它的得分不是三个区间中最高的。

这是因为选择区间 8-21 会阻止我们稍后使用区间 10-15,从而减少总和(在这种情况下,总和将为 19+25=44)。

我正在寻找 O(nlogn) 或 O(n) 解决方案来解决这个问题。我认为可以使用动态规划,但我可能是错的。有人可以建议一个可以解决这个问题的解决方案/算法吗?

编辑:间隔没有特定的顺序。

The problem I am trying to solve has a list of intervals on the number line, each with a pre-defined score. I need to return the maximum possible total score.

The catch is that the intervals overlap, and of the overlapping intervals I can use only one. Here is an example.

Intervals   - Score  
   0- 5     -  15  
   4- 9     -  18  
  10-15     -  12  
   8-21     -  19  
  25-30     -  25    

Here, the intervals 0-5, 4-9 and 8-21 overlap.
The intervals 10-15 and 8-21 also overlap.
The maximum sum would be 55 (18+12+25).

It is important to note here that we select the interval 4-9 of the first batch of overlapping intervals even though it does not have the highest score of the three.

This is because selecting the interval 8-21 would prevent us from using the interval 10-15 later on, thereby reducing the overall sum (In that case, the overall sum would be 19+25=44).

I am looking for an O(nlogn) or O(n) solution to this problem. I think dynamic programming can be used, but I may be wrong. Could someone suggest a solution/algorithm(s) that could do the trick here?

Edit: The intervals are in no particular order.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

很快妥协 2024-09-16 01:08:24

这是间隔调度的加权变体;它可以通过动态规划O(N log N)中解决。

设间隔为g(start, stop, Score),并按stop排序。为了简单起见,我们现在假设所有 stop 都是唯一的。

best[i] 为当允许使用 g[1], ..., g[i] 时我们可以获得的最佳分数。当然,我们不必全部使用它们,而且通常我们不能这样做,因为我们使用的间隔子集必须是不重叠的。

  • 显然最佳[0] = 0。也就是说,由于我们不能使用任何区间,因此我们可以获得的最佳分数是 0。
  • 对于任何 1 <= k <= N,我们有:
    • best[k] = max( best[k-1], best[j] + g[k].score ),其中
      • j 是满足 g[j].stop < 的最大索引。 g[k].startj 可能为零)

也就是说,假设我们可以使用 g[1], ... g[k]< /code>,我们能做的最好的事情就是对这两个选项进行更好的评分:

  • 我们不包括 g[k]。因此,该选项的得分为best[k-1]
    • ...因为这是我们能用 g[1], ... g[k-1]
    • 做的最好的事情

  • 我们包括 g[k],在其左侧,我们尽力处理所有不与 g[k] 重叠的基因,即所有 g[1], ..., g[j] ,其中 g[j].stop < g[k].startj 尽可能大。因此,该选项的得分为best[j] + g[k].score

(注意上式中体现的动态规划的最优子结构和重叠子问题分量)。

这个问题的总体答案是best[N],即当我们被允许使用所有基因时我们可以获得的最好分数。哎呀,我说的是基因吗?我的意思是间隔。

这是 O(N log N) 因为:

  • 对所有间隔进行排序需要 O(N log N)
  • 为每个 k 查找 j 使用二分搜索是 O(log N)

如果几个基因可以有相同的 stop 值,那么什么都没有改变:你仍然需要搜索最右边的 <代码>j。例如,在 Python 中,使用 bisect_right 可以轻松实现这一点。在 Java 中,标准库二分搜索不能保证在出现平局时返回哪个索引,您可以(在许多选项中)跟随线性搜索(对于 O(N) 最坏情况性能),或另一系列二分搜索来查找最正确的索引。

哎呀我又说基因了吗?我的意思是间隔。

相关问题

This is a weighted variation of interval scheduling; it's solvable in O(N log N) with dynamic programming.

Let an interval be g(start, stop, score), and let them be sorted by stop. For simplicity, let's assume for now that all stop is unique.

Let best[i] be the best score we can get when we're allowed to use g[1], ..., g[i]. We don't have to use them all, of course, and generally we can't because the subset of intervals we use must be non-overlapping.

  • Clearly best[0] = 0. That is, since we can't use any interval, the best score we can get is 0.
  • For any 1 <= k <= N, we have:
    • best[k] = max( best[k-1], best[j] + g[k].score ), where
      • j is the largest index such that g[j].stop < g[k].start (j may be zero)

That is, given that we're allowed to use g[1], ... g[k], the best we can do is the better scoring of these two options:

  • We do not include g[k]. Thus, the score of this option is best[k-1].
    • ... because that's the best we can do with g[1], ... g[k-1]
  • We include g[k], and to its left we do the best we can with all the genes that don't overlap with g[k], i.e. all g[1], ..., g[j], where g[j].stop < g[k].start and j is as large as possible. Thus, the score of this option is best[j] + g[k].score.

(Note the optimal substructure and overlapping subproblems components of dynamic programming embodied in the above equation).

The overall answer to the question is best[N], i.e. the best score we can get when we're allowed to use all the genes. Oops, did I say genes? I mean intervals.

This is O(N log N) because:

  • Sorting all the intervals takes O(N log N)
  • Finding j for each k is O(log N) using binary search

If several genes can have the same stop values, then nothing changed: you still have to search for the rightmost j. In e.g. Python this is easy with bisect_right. In Java where the standard library binary search doesn't guarantee which index is returned in case of ties, you can (among many options) follow it with a linear search (for O(N) worst-case performance), or another series of binary searches to find the right most index.

Oops did I say genes again? I mean intervals.

Related questions

吃素的狼 2024-09-16 01:08:24

首先,我认为最大值是59,而不是55。如果你选择区间[0-5]、[8-21]和[25,30],你会得到15+19+25=59。您可以使用某种动态编程来处理这个问题。

首先,按起始点对所有间隔进行排序,然后从结束到开始迭代。对于列表中的每个项目,您选择从该点到最后一个的最大总和,即 max(S[i]+S[j], S[i+1]),其中 i 是项目如果您在,j 是您的项目之后的第一个非重叠条目的项目(即,开始时间大于当前项目结束时间的第一个项目)。为了加速算法,您需要存储每个元素的最大部分和 S[j]。

为了澄清一下,让我根据这个来解决你的例子。首先,对间隔进行排序:

 1:  0- 5 -  15
 2:  4- 9 -  18
 3:  8-21 -  19
 4: 10-15 -  12
 5: 25-30 -  25

因此,

 S[5] = 25
 S[4] = max(12+S[5], 25)=37
 S[3] = max(19+S[5], S[4])=max(19+25,37)=44
 S[2] = max(18+S[4], S[3])=max(18+37,44)=55
 S[1] = max(15+S[3], S[2])=max(15+44, 55)=59

这是 这篇文章,但不幸的是,没有很好的 O(n) 运行时间。每个条目与下一个条目重叠的简并列表将导致其复杂度为 O(n^2)。

First of all, I think the maximum is 59, not 55. If you choose intervals [0-5],[8-21], and [25,30], you get 15+19+25=59. You can use some sort of dynamic programming to handle this.

First, you sort all the intervals by their starting point, then iterate from end to start. For each item in list, you choose the maximum sum from that point to the last as max(S[i]+S[j], S[i+1]), where i is the item you are on, j is the item that is the first non-overlapping entry following your item (that is, the first item whose start is larger than the current item's end). To speed up the algorithm, you want to store the maximum partial sum S[j] for each element.

To clarify, let me solve your example according to this. First, sort your intervals:

 1:  0- 5 -  15
 2:  4- 9 -  18
 3:  8-21 -  19
 4: 10-15 -  12
 5: 25-30 -  25

So,

 S[5] = 25
 S[4] = max(12+S[5], 25)=37
 S[3] = max(19+S[5], S[4])=max(19+25,37)=44
 S[2] = max(18+S[4], S[3])=max(18+37,44)=55
 S[1] = max(15+S[3], S[2])=max(15+44, 55)=59

This is an adaptation of the algorithm in this post, but unfortunately, doesn't have the nice O(n) running time. A degenerate list where each entry overlaps the next would cause it to be O(n^2).

丢了幸福的猪 2024-09-16 01:08:24

我想了一下这个问题并想出了一些办法。

间隔树提供了一种查找与给定间隔重叠的所有间隔的有效方法。遍历整个间隔集合,我们可以找到给定间隔的所有重叠间隔。一旦我们有了这些,我们就可以找到得分最高的区间,将其存储并继续。

构建树需要 O(N Log N) 时间,查找需要 O(Log N) 时间。因为我们对所有元素进行查找,所以解决方案变为 O(N Log N)。

但是,如果我们遇到类似上面示例的情况,其中一组中的最高分数区间减少了总数,则该算法会失败,因为我们无法预先知道不应使用最高分数区间。解决这个问题的明显方法是计算两个(或全部)总数,以防我们不确定,但这使我们回到了可能的 O(N^2) 或更糟糕的解决方案。

I thought of this a bit and came up with something.

Interval Trees provide an efficient way of finding all the intervals that overlap a given interval. Walking through the entire set of intervals, we can find all the overlapping intervals for a given one. Once we have these, we can find the interval with the highest score, store it and move on.

Building the tree takes O(N Log N) time and lookup takes O(Log N) time. Because we do a lookup for all elements, the solution becomes O(N Log N).

However, if we face something like the example above where the highest score interval in one group reduces the total, the algorithm fails because we have no way of knowing that the highest score interval should not be used before hand. The obvious way around this would be to calculate both (or all) totals in case we are not sure, but that puts us back to a potentially O(N^2) or worse solution.

风和你 2024-09-16 01:08:24

也许是像 可以使用这个答案,至少对于该问题来说是O(n)。这意味着迭代一次间隔并仅跟踪那些仍然可以产生最佳最终解决方案的间隔组合。

Maybe an approach like in this answer could be used, which is O(n) at least for that problem. It would mean to iterate once through the intervals and keep track of just those interval combinations that still could lead to an optimal final solution.

喵星人汪星人 2024-09-16 01:08:24

我认为我们可以使用这个递归...

S[i] 表示每个区间的分数
所有间隔,但我相信它应该有效。

ResMax[i] = max(ResMax[i-1] + S[i] //if i is included
           ,max(R[i-1],S[i]) 
         )

Interval[i] 表示我没有彻底检查的

I think we can use this recursion...

S[i] denotes the score of each interval
Interval[i] denotes all the intervals

ResMax[i] = max(ResMax[i-1] + S[i] //if i is included
           ,max(R[i-1],S[i]) 
         )

I am not checked thoroughly but it should work i beleive.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文