Matlab：删除多余的“移位”矩阵的条目

发布于 2024-12-20 18:32:04 字数 1236 浏览 2 评论 0 原文

我有一个需要解决的问题，但我想不出任何简单且更重要的解决方案：快速解决方案。这有点像多个旅行推销员问题的一部分。

首先，我有一个包含 X 行和 N 列的矩阵，N 是我的算法的静态变量，X可能会有所不同。让我们假设它看起来像（这里 N = 5）：

matrix = [1 2 4 3 5; 4 3 1 2 5; 1 2 4 3 5; ]
matrix =
 1     2     4     3     5
 4     3     1     2     5
 1     2     4     3     5

每一行都被视为一条“路线”，并包含 1 和 N 之间的所有唯一数字。）将被分成部分路线。这意味着，我有一个断点矩阵，其中包含 X 行和 M (M ) 列。例如：

breakpoints = [2 3 4; 1 2 4; 1 3 4]
breakpoints =
 2     3     4
 1     2     4
 1     3     4

每行断点的索引给出了矩阵相应行的元素，之后路由将被分割成部分路由。为了清楚起见，让我们以第一行为例： breakpoints(1, :) = 2 3 4 这意味着路由 matrix(1, :) = 1 2 4 3 5 将被拆分为部分路由 [1 2]、[4]、[3] 和 [5]。第二行有断点 breakpoints(2, :) = 1 2 4 ，它将把第二条路由 matrix(2, :) = 4 3 1 2 5 分成部分路线[4]、[3]、[1 2]和[5]。

现在我的目标是从矩阵中删除所有行，而部分路由是冗余的重复项，只是顺序不同。在此示例中，第 2 行是第 1 行的重复项。即使第 3 行与第 1 行具有相同的路由，但它也不是重复的，因为有不同的断点导致部分路由 [1], [2 4] 、[3]和[5]。

我怎样才能干净、快速地做到这一点？矩阵可以包含许多元素，例如 X = 5e4 行和 N = 10、M = 6。

原文

I have a problem I need to solve, but I can't think of any easy and more important: fast solution. It's a bit like a part of a multiple traveling salesman problem.

First I have a matrix with X rows and N columns, N is a static variable of my algorithm and X can vary. Let's assume it looks like (here N = 5):

matrix = [1 2 4 3 5; 4 3 1 2 5; 1 2 4 3 5; ]
matrix =
 1     2     4     3     5
 4     3     1     2     5
 1     2     4     3     5

every row is seen as a "route" and contains all the unique numbers between 1 and N Each route (= row) will be split in partial routes. That means, I have a breakpoint matrix which contains X rows and M (M < N) columns. E.g.:

breakpoints = [2 3 4; 1 2 4; 1 3 4]
breakpoints =
 2     3     4
 1     2     4
 1     3     4

The indices of each row of breakpoints give the elements of the corresponding row of matrix AFTER which the route will be split into partial routes. Just to make clear, let's regard the frist row as an example: breakpoints(1, :) = 2 3 4 which means, that the route matrix(1, :) = 1 2 4 3 5 will be split into the partial routes [1 2], [4], [3] and [5]. The second row has the breakpoints breakpoints(2, :) = 1 2 4 which will split the second route matrix(2, :) = 4 3 1 2 5 into the partial routes [4], [3], [1 2] and [5].

Now my aim is to remove all rows from matrix, whereas the partial routes are redundant duplicates, just in a different order. In this example row 2 is a duplicate of row 1. Row 3 is NO duplicate even if it has the same route as row 1, because there are different breakpoints which lead to the partial routes [1], [2 4], [3] and [5].

How could I do this cleanly and fast? Matrix can contain many elements, like X = 5e4 rows and N = 10, M = 6.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉栀 2024-12-27 18:32:04

对于常数 M、N，可以通过按顺序对复合记录进行排序，然后测试相邻条目的相等性，在 O(X log X) 时间内解决此问题。

我所说的“复合记录”是指将行及其断点的函数组合成单个记录的记录。对于给定的行，该函数是通过以下方式获得的：

对行应用断点，获取部分路由的列表。
按每条路由的第一个元素对部分路由进行升序排序。例如将 {[4], [3], [1 2], [5]} 排序为 {[1 2], [3], [4], [5]}。
通过连接已排序的部分路由形成新的复合记录；有效断点；和源行索引。例如如果上一步中的示例行是第 2 行 = (4 3 1 2 5)，则保存 (1 2 3 4 5; 2 3 4; 2)，即（已排序的部分路由；有效断点） ; 指数）。

对复合记录进行排序后，遍历它们查找相邻条目的相等性，直到源索引。例如，(1 2 3 4 5; 2 3 4; 2) 和 (1 2 3 4 5; 2 3 4; 7) 表示第 7 行的部分路由与第 2 行的部分路由重复。每次发现重复项时，将其对应的原始第一行条目设置为无效点号，例如 N+1。

因此，在排序后，花费 O(X log X)，使用 O(X) 时间来检测重复项。然后使用 O(X) 时间通过遍历原始行来挤出重复项，删除那些第一个元素无效的行。

稍微更准确的总体成本是 O((M+N)*X*log X)，它超出了理论最小值 O((M+N)*X) log X 因子。如果将复合记录存储在哈希表中而不是对它们进行排序，并在出现重复哈希条目时将记录标记为删除，则可以消除 log X 因子。