找到最长非重叠序列的算法
我正在尝试找到解决以下问题的最佳方法。我所说的最好的方式是指不太复杂。
作为输入的元组列表 (start,length),如下所示:
[(0,5),(0,1),(1,9),(5,5),(5,7),(10,1)]
每个元素通过其 start 和 length 表示一个序列,例如 (5,7) 相当于序列 (5,6,7,8,9,10,11)
- 从 5 开始的 7 个元素的列表。可以假设元组按 start
排序> 元素。
输出应返回表示最长连续序列的不重叠的元组组合。这意味着,解决方案是范围的子集,没有重叠和间隙,并且是最长的可能 - 但可能有多个。
例如,对于给定的输入,解决方案是:
[(0,5),(5,7)]
相当于 (0,1,2,3,4,5,6, 7,8,9,10,11)
回溯是解决这个问题的最佳方法吗?
我对人们建议的任何不同方法感兴趣。
另外,如果有人知道这个问题的正式参考资料或另一个类似的问题,我想获得参考资料。
顺便说一句 - 这不是家庭作业。
编辑
预期行为的另一个示例
只是为了避免一些错误,这是[(0,1),(1,7),(3,20),(8,5 等输入的 )]
正确答案是 [(3,20)]
相当于长度为 20 的 (3,4,5,..,22)。收到的一些答案将给出 < code>[(0,1),(1,7),(8,5)] 相当于 (0,1,2,...,11,12) 作为正确答案。但最后一个答案不正确,因为它比 [(3,20)]
短。
I am trying to find the best way to solve the following problem. By best way I mean less complex.
As an input a list of tuples (start,length) such:
[(0,5),(0,1),(1,9),(5,5),(5,7),(10,1)]
Each element represets a sequence by its start and length, for example (5,7) is equivalent to the sequence (5,6,7,8,9,10,11)
- a list of 7 elements starting with 5. One can assume that the tuples are sorted by the start
element.
The output should return a non-overlapping combination of tuples that represent the longest continuous sequences(s). This means that, a solution is a subset of ranges with no overlaps and no gaps and is the longest possible - there could be more than one though.
For example for the given input the solution is:
[(0,5),(5,7)]
equivalent to (0,1,2,3,4,5,6,7,8,9,10,11)
is it backtracking the best approach to solve this problem ?
I'm interested in any different approaches that people could suggest.
Also if anyone knows a formal reference of this problem or another one that is similar I'd like to get references.
BTW - this is not homework.
Edit
Just to avoid some mistakes this is another example of expected behaviour
for an input like [(0,1),(1,7),(3,20),(8,5)]
the right answer is [(3,20)]
equivalent to (3,4,5,..,22) with length 20. Some of the answers received would give [(0,1),(1,7),(8,5)]
equivalent to (0,1,2,...,11,12) as right answer. But this last answer is not correct because is shorter than [(3,20)]
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
使用给定的顺序(按起始元素)迭代元组列表,同时使用哈希图来跟踪某个索引上最长连续序列 ending 的长度。
伪代码,跳过诸如在哈希图中未找到的项之类的细节(假设如果未找到则返回 0):
这是一个 O(N) 算法。
如果您需要组成此序列的实际元组,则还需要保留按结束索引散列的元组链接列表,每当更新此端点的最大长度时更新此列表。
更新:我对 python 的了解相当有限,但是根据您粘贴的 python 代码,我创建了这个返回实际序列而不仅仅是长度的代码:
Iterate over the list of tuples using the given ordering (by start element), while using a hashmap to keep track of the length of the longest continuous sequence ending on a certain index.
pseudo-code, skipping details like items not found in a hashmap (assume 0 returned if not found):
This is an O(N) algorithm.
If you need the actual tuples making up this sequence, you'd need to keep a linked list of tuples hashed by end index as well, updating this whenever the max length is updated for this end-point.
UPDATE: My knowledge of python is rather limited, but based on the python code you pasted, I created this code that returns the actual sequence instead of just the length:
修改后的算法:
c#实现
测试:
Revised algorithm:
c# implementation
tests:
这是加权有向无环图的最长路径问题的特例。
图中的节点是序列中最后一个元素之后的起点和点,下一个序列可以从这里开始。
这个问题很特殊,因为无论路径如何,两个节点之间的距离必须相同。
This is a special case of the longest path problem for weighted directed acyclic graphs.
The nodes in the graph are the start points and the points after the last element in a sequence, where the next sequence could start.
The problem is special because the distance between two nodes must be the same independently of the path.
只从基本角度考虑算法,这行得通吗?
(对糟糕的语法表示歉意,但我试图在这里保持语言独立)
首先是最简单的形式:找到最长的连续对。
循环浏览每个成员并将其与具有较高起始位置的其他每个成员进行比较。如果第二个成员的 startpos 等于第一个成员的 startpos 和长度之和,则它们是连续的。如果是这样,则在新集合中形成一个新成员,并使用较低的起始位置和组合长度来表示这一点。
然后,将这些对中的每一个与具有较高起始位置的所有单个成员进行比较并重复,形成一组新的连续三元组(如果存在)。
继续这个模式,直到没有新的集合。
棘手的部分是你必须比较每个集合中每个成员的长度才能找到真正最长的链。
我很确定这不如其他方法有效,但我相信这是暴力破解此解决方案的可行方法。
如果您对此以及我可能忽略的任何错误提供反馈,我将不胜感激。
Just thinking about the algorithm in basic terms, would this work?
(apologies for horrible syntax but I'm trying to stay language-independent here)
First the simplest form: Find the longest contiguous pair.
Cycle through every member and compare it to every other member with a higher startpos. If the startpos of the second member is equal to the sum of the startpos and length of the first member, they are contiguous. If so, form a new member in a new set with the lower startpos and combined length to represent this.
Then, take each of these pairs and compare them to all of the single members with a higher startpos and repeat, forming a new set of contiguous triples (if any exist).
Continue this pattern until you have no new sets.
The tricky part then is you have to compare the length of every member of each of your sets to find the real longest chain.
I'm pretty sure this is not as efficient as other methods, but I believe this is a viable approach to brute forcing this solution.
I'd appreciate feedback on this and any errors I may have overlooked.
编辑以用实际的 Python 代码替换伪代码
再次编辑以更改代码;原始算法是关于解决方案的,但我误解了对中的第二个值是什么!幸运的是,基本算法是相同的,并且我能够更改它。
这是一个以 O(N log N) 解决问题的想法,并且不使用哈希映射(因此没有隐藏时间)。对于内存,我们将使用 N * 2 个“事物”。
我们将向每个元组添加两个以上的值:(BackCount、BackLink)。在成功的组合中,BackLink 将从右到左从最右边的元组链接到最左边的元组。 BackCount 将是给定 BackLink 的累计计数值。
下面是一些 python 代码:
整个算法的关键是代码中“THIS IS THE KEY”注释的位置。我们知道当前的 StartTuple 可以链接到 EndTuple。如果存在以 EndTuple.To 结尾的较长序列,则在我们到达此点时已找到它,因为它必须从较小的 StartTuple.From 开始,并且数组按“From”排序!
Edited to replace pseudocode with actual Python code
Edited AGAIN to change the code; The original algorithm was on the solution, but I missunderstood what the second value in the pairs was! Fortunatelly the basic algorithm is the same, and I was able to change it.
Here's an idea that solves the problem in O(N log N) and doesn't use a hash map (so no hidden times). For memory we're going to use N * 2 "things".
We're going to add two more values to each tuple: (BackCount, BackLink). In the successful combination BackLink will link from right to left from the right-most tuple to the left-most tuple. BackCount will be the value accumulated count for the given BackLink.
Here's some python code:
The key for the whole algorithm is where the "THIS IS THE KEY" comment is in the code. We know our current StartTuple can be linked to EndTuple. If a longer sequence that ends at EndTuple.To exists, it was found by the time we got to this point, because it had to start at an smaller StartTuple.From, and the array is sorted on "From"!
我删除了之前的解决方案,因为它未经测试。
问题是找到“加权有向无环图”中的最长路径,它可以在线性时间内解决:
http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acircular_graphs
将一组{起始位置}并集{(起始位置+结束位置)}作为顶点。对于您的示例,
如果存在使 v0 + w = v1 的最终值 w,则顶点 v0、v1 的值为 {0, 1, 5, 10, 11, 12},然后添加一条连接 v0 到 v1 的有向边,然后将w 为其重量。
现在按照维基百科页面中的伪代码进行操作。由于顶点数是最大值2xn(n是元组数),所以问题仍然可以在线性时间内解决。
I removed the previous solution because it was not tested.
The problem is finding the longest path in a "weighted directed acyclic graph", it can be solved in linear time:
http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acyclic_graphs
Put a set of {start positions} union {(start position + end position)} as vertices. For your example it would be {0, 1, 5, 10, 11, 12}
for vertices v0, v1 if there is an end value w that makes v0 + w = v1, then add a directed edge connecting v0 to v1 and put w as its weight.
Now follow the pseudocode in the wikipedia page. since the number of vertices is the maximum value of 2xn (n is number of tuples), the problem can still be solved in linear time.
这是一个简单的归约操作。给定一对连续的元组,它们要么可以组合,要么不能组合。因此,定义成对组合函数:
这仅返回组合两个参数的一个元素或原始两个元素的列表。
然后定义一个函数来迭代第一个列表并组合对:
这将保留第一个元素与列表中的当前项目进行比较(从第二个项目开始),当它无法组合它们时,它将第一个元素放入新列表并将
first
替换为两者中的第二个。然后只需使用元组列表调用
collapse
即可:[编辑] 最后,迭代结果以获得最长的序列。
[/编辑]
复杂度 O(N)。对于奖励标记,请反向执行,以便初始
pop(0)
变为pop()
,并且您不必重新索引数组或移动迭代器反而。对于最高分,请使其作为成对reduce
操作运行,以实现多线程的优点。This is a simple reduce operation. Given a pair of consecutive tuples, they either can or can't be combined. So define the pairwise combination function:
This just returns a list of either one element combining the two arguments, or the original two elements.
Then define a function to iterate over the first list and combine pairs:
This keeps a first element to compare with the current item in the list (starting at the second item), and when it can't combine them it drops the first into a new list and replaces
first
with the second of the two.Then just call
collapse
with the list of tuples:[Edit] Finally, iterate over the result to get the longest sequence.
[/Edit]
Complexity O(N). For bonus marks, do it in reverse so that the initial
pop(0)
becomes apop()
and you don't have to reindex the array, or move the iterator instead. For top marks make it run as a pairwisereduce
operation for multithreaded goodness.这听起来像是一个完美的“动态编程”问题...
最简单的程序是用蛮力(例如递归)来实现,但这具有指数复杂性。
通过动态编程,您可以设置长度为 n 的数组 a,其中 n 是问题的所有(起始+长度)值的最大值,其中 a[i] 表示最长的非重叠序列,直到 a[i]。然后,您可以逐步遍历所有元组,更新 a。该算法的复杂度为 O(n*k),其中 k 是输入值的数量。
This sounds like a perfect "dynamic programming" problem...
The simplest program would be to do it brute force (e.g. recursive), but this has exponential complexity.
With dynamic programming you can set up an array a of length n, where n is the maximum of all (start+length) values of your problem, where a[i] denotes the longest non-overlapping sequence up to a[i]. You can then step trought all tuples, updating a. The complexity of this algorithm would be O(n*k), where k is the number of input values.
I认为复杂度大约为 O(4-5*N)
(请参阅更新),
其中 N 是元组中的项目数。
更新
正如您所发现的,复杂性并不准确,但绝对非常小,因为它是行延伸(元组项)数量的函数。
因此,如果 N 是线段的数量,则排序的时间复杂度为 O(2N * log2N)。比较的时间复杂度为 O(2N)。查找直线长度也是 O(2N)。总而言之,O(2N(log2N + 2))。
I think complexity is around O(4-5*N)
(SEE UPDATE)
with N being number of items in the tuple.
UPDATE
As you figured out, the complexity is not accurate but definitely very small since it is a function of number of line stretches (tuple items).
So if N is number of line stretches, sorting is O(2N * log2N). Comparison is O(2N). Finding line stretches is also O(2N). So all in all O(2N(log2N + 2)).