我有一个需要解决的问题,但我想不出任何简单且更重要的解决方案:快速解决方案。这有点像多个旅行推销员问题的一部分。
首先,我有一个包含 X
行和 N
列的矩阵,N
是我的算法的静态变量,X
可能会有所不同。让我们假设它看起来像(这里 N = 5
):
matrix = [1 2 4 3 5; 4 3 1 2 5; 1 2 4 3 5; ]
matrix =
1 2 4 3 5
4 3 1 2 5
1 2 4 3 5
每一行都被视为一条“路线”,并包含 1 和 N
之间的所有唯一数字。 )将被分成部分路线。这意味着,我有一个断点矩阵,其中包含 X
行和 M
(M ) 列。例如:
breakpoints = [2 3 4; 1 2 4; 1 3 4]
breakpoints =
2 3 4
1 2 4
1 3 4
每行断点
的索引给出了矩阵
相应行的元素,之后路由将被分割成部分路由。为了清楚起见,让我们以第一行为例: breakpoints(1, :) = 2 3 4
这意味着路由 matrix(1, :) = 1 2 4 3 5
将被拆分为部分路由 [1 2]、[4]、[3] 和 [5]
。第二行有断点 breakpoints(2, :) = 1 2 4
,它将把第二条路由 matrix(2, :) = 4 3 1 2 5
分成部分路线[4]、[3]、[1 2]和[5]
。
现在我的目标是从矩阵中删除所有行,而部分路由是冗余的重复项,只是顺序不同。在此示例中,第 2 行是第 1 行的重复项。即使第 3 行与第 1 行具有相同的路由,但它也不是重复的,因为有不同的断点导致部分路由 [1], [2 4] 、[3]和[5]
。
我怎样才能干净、快速地做到这一点?矩阵可以包含许多元素,例如 X = 5e4
行和 N = 10
、M = 6
。
I have a problem I need to solve, but I can't think of any easy and more important: fast solution. It's a bit like a part of a multiple traveling salesman problem.
First I have a matrix with X
rows and N
columns, N
is a static variable of my algorithm and X
can vary. Let's assume it looks like (here N = 5
):
matrix = [1 2 4 3 5; 4 3 1 2 5; 1 2 4 3 5; ]
matrix =
1 2 4 3 5
4 3 1 2 5
1 2 4 3 5
every row is seen as a "route" and contains all the unique numbers between 1 and N
Each route (= row) will be split in partial routes. That means, I have a breakpoint matrix which contains X
rows and M
(M < N
) columns. E.g.:
breakpoints = [2 3 4; 1 2 4; 1 3 4]
breakpoints =
2 3 4
1 2 4
1 3 4
The indices of each row of breakpoints
give the elements of the corresponding row of matrix
AFTER which the route will be split into partial routes. Just to make clear, let's regard the frist row as an example: breakpoints(1, :) = 2 3 4
which means, that the route matrix(1, :) = 1 2 4 3 5
will be split into the partial routes [1 2], [4], [3] and [5]
. The second row has the breakpoints breakpoints(2, :) = 1 2 4
which will split the second route matrix(2, :) = 4 3 1 2 5
into the partial routes [4], [3], [1 2] and [5]
.
Now my aim is to remove all rows from matrix
, whereas the partial routes are redundant duplicates, just in a different order. In this example row 2 is a duplicate of row 1. Row 3 is NO duplicate even if it has the same route as row 1, because there are different breakpoints which lead to the partial routes [1], [2 4], [3] and [5]
.
How could I do this cleanly and fast? Matrix can contain many elements, like X = 5e4
rows and N = 10
, M = 6
.
发布评论
评论(1)
对于常数 M、N,可以通过按顺序对复合记录进行排序,然后测试相邻条目的相等性,在 O(X log X) 时间内解决此问题。
我所说的“复合记录”是指将行及其断点的函数组合成单个记录的记录。对于给定的行,该函数是通过以下方式获得的:
对复合记录进行排序后,遍历它们查找相邻条目的相等性,直到源索引。例如,(1 2 3 4 5; 2 3 4; 2) 和 (1 2 3 4 5; 2 3 4; 7) 表示第 7 行的部分路由与第 2 行的部分路由重复。每次发现重复项时,将其对应的原始第一行条目设置为无效点号,例如 N+1。
因此,在排序后,花费 O(X log X),使用 O(X) 时间来检测重复项。然后使用 O(X) 时间通过遍历原始行来挤出重复项,删除那些第一个元素无效的行。
稍微更准确的总体成本是 O((M+N)*X*log X),它超出了理论最小值 O((M+N)*X) log X 因子。如果将复合记录存储在哈希表中而不是对它们进行排序,并在出现重复哈希条目时将记录标记为删除,则可以消除 log X 因子。
For constant M, N, this can be solved in time O(X log X) by sorting composite records into order, and then testing for equality of adjacent entries.
By a "composite record" I mean a record that combines a function of a row and its breakpoints into a single record. The function is obtained, for a given row, by:
After sorting the composite records, go through them looking for equality of adjacent entries, up to source index. For example, (1 2 3 4 5; 2 3 4; 2) and (1 2 3 4 5; 2 3 4; 7) indicate that partial routes from row 7 duplicate those of row 2. Each time a duplicate is found, set its corresponding original first row entry to an invalid point number, say N+1.
Thus, after sorting, which cost O(X log X), use O(X) time to detect duplicates. Then use O(X) time to squeeze out duplicates by going through original rows dropping those with an invalid first element.
A slightly more accurate overall cost is O((M+N)*X*log X), which exceeds the theoretical minimum O((M+N)*X) by a log X factor. You can get rid of the log X factor if you store the composite records in a hash table instead of sorting them, and mark records for deletion when duplicate hash entries occur.