采访摘自:删除 n×n 矩阵中的行和列以最大化剩余值的总和

发布于 2024-08-10 11:55:25 字数 96 浏览 3 评论 0 原文

给定一个 n×n 实数矩阵。您可以删除任意数量(从 0 到 n)的行和任意数量(从 0 到 n)的列,然后计算剩余条目的总和。提出一种算法来找出要删除的行和列,以便最大化该总和。

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(17

仙女 2024-08-17 11:55:25

问题是NP-hard。 (所以你不应该期望用多项式时间算法来解决这个问题。不过,仍然可能有(非多项式时间)算法比暴力破解稍微好一些。)NP-hardness 证明背后的想法是如果我们能解决这个问题,那么我们就可以解决一般图中的派系问题。 (最大团问题是找到图中最大的成对连接顶点集。)

具体来说,给定任何具有 n 个顶点的图,让我们用条目 a[i 组成矩阵 A ][j] 如下:

  • a[i][j] = 1 for i == j(对角线条目)
  • a[i ][j] = 0 如果边 (i,j) 存在于图中(且 i≠j
  • a[i][j] = -n- 1 如果边 (i,j) 不存在于图中。

现在假设我们解决删除一些行和列(或者等效地,保留一些行和列)的问题,以使矩阵中的条目之和最大化。那么答案给出了图中的最大集团:

  1. 声明:在任何最优解中,都不存在行i和列j保留图中不存在边 (i,j) 的情况。证明:由于a[i][j] = -n-1并且所有正项的总和最多为n,因此选择(i,j)将导致为负数。 (请注意,删除所有行和列将得到更好的总和,即 0。)

  2. 声明:在(某些)最佳解决方案中,行和列的集合保留的是一样的。这是因为从任何最佳解决方案开始,我们都可以简单地删除未保留列i的所有行i,反之亦然。请注意,由于唯一的正数项是对角线项,因此我们不会减少总和(并且根据之前的声明,我们也不会增加它)。

所有这些都意味着,如果图的最大团的大小为 k,那么我们的矩阵问题就有一个总和为 k 的解,反之亦然。因此,如果我们能够在多项式时间内解决我们的初始问题,那么派系问题也将在多项式时间内解决。这证明最初的问题是NP-hard。 (实际上,很容易看出初始问题的决策版本 - 有没有办法删除一些行和列,使得总和至少为 k - 是 NP 形式,所以(初始问题的决策版本)实际上是NP-complete。)

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)

Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:

  • a[i][j] = 1 for i == j (the diagonal entries)
  • a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
  • a[i][j] = -n-1 if the edge (i,j) is not present in the graph.

Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:

  1. Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)

  2. Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).

All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

握住你手 2024-08-17 11:55:25

蛮力方法是这样的:

  • 对于 n 行,有 2n 子集。
  • 对于 n 列,有 2n 个子集。
  • 对于 nxn 矩阵,有 22n 子集。

0 个元素是有效的子集,但显然如果有 0 行或 0 列,则总数为 0,因此实际上有 22n-2+1 个子集,但这没有什么不同。

因此,您可以通过暴力破解作为 O(an) 算法来计算出每个组合。快速地。 :)

计算出最大可能值会更快,您可以通过将网格中的所有正数相加来实现。如果这些数字恰好形成一个有效的子矩阵(意味着您可以通过删除行和/或列来创建该集合),那么这就是您的答案。

这隐含的是,如果没有一个数字为负数,那么根据定义,完整的矩阵就是答案。

另外,知道最大可能的最大值可能允许您缩短强力评估的时间,因为如果您得到等于该最大值的任何组合,那么这就是您的答案,您可以停止检查。

此外,如果所有数字均为非正数,则答案是最大值,因为根据定义,您可以将矩阵简化为其中包含 1 个值的 1 x 1 矩阵。

这里有一个想法:构造 2n-1 nxm 矩阵,其中 1 <= m <= n。依次处理它们。对于每个 nxm 矩阵,您可以计算:

  1. 最高可能的最大总和(如上所述);以及
  2. 是否没有正数可以让您简化答案。

如果 (1) 低于当前计算的最高最大总和,那么您可以丢弃这个 nxm 矩阵。如果 (2) 为真,那么您只需要与当前最高的最大总和进行简单比较。

这通常被称为修剪技术。

此外,您可以首先说 nxn 矩阵中的最高数字是起始最高最大和,因为显然它可以是 1 x 1 矩阵。

我确信您可以将其调整为(稍微更)高效的基于树的递归搜索算法,并通过上述测试有效地允许您消除(希望是许多)不必要的搜索。

Well the brute force method goes something like this:

  • For n rows there are 2n subsets.
  • For n columns there are 2n subsets.
  • For an n x n matrix there are 22n subsets.

0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.

So you can work out each combination by brute force as an O(an) algorithm. Fast. :)

It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.

Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.

Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.

Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.

Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:

  1. The highest possible maximum sum (as per above); and
  2. Whether no numbers are positive allowing you to shortcut the answer.

if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.

This is generally referred to as a pruning technique.

What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.

I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

恏ㄋ傷疤忘ㄋ疼 2024-08-17 11:55:25

我们可以通过将其建模为有向图来改进 Cletus 的广义暴力解决方案。初始矩阵是图的起始节点;它的叶子是所有缺少一行或一列的矩阵,依此类推。它是一个图而不是一棵树,因为没有第一列和第一行的矩阵节点将有两个父节点 - 仅缺少第一列或第一行的节点。

我们可以通过将图变成一棵树来优化我们的解决方案:探索删除了我们删除的列或行之前的列或行的子矩阵来到达当前节点是没有任何意义的,因为无论如何都会到达该子矩阵。

当然,这仍然是一种强力搜索 - 但我们已经消除了以不同顺序删除相同行的重复情况。

以下是 Python 中的示例实现:

def maximize_sum(m):
  frontier = [(m, 0, False)]
  best = None
  best_score = 0

  while frontier:
    current, startidx, cols_done = frontier.pop()
    score = matrix_sum(current)
    if score > best_score or not best:
      best = current
      best_score = score
    w, h = matrix_size(current)
    if not cols_done:
      for x in range(startidx, w):
        frontier.append((delete_column(current, x), x, False))
      startidx = 0
    for y in range(startidx, h):
      frontier.append((delete_row(current, y), y, True))
  return best_score, best

这是 280Z28 示例矩阵的输出:

>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])

We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.

We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.

This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.

Here's an example implementation in Python:

def maximize_sum(m):
  frontier = [(m, 0, False)]
  best = None
  best_score = 0

  while frontier:
    current, startidx, cols_done = frontier.pop()
    score = matrix_sum(current)
    if score > best_score or not best:
      best = current
      best_score = score
    w, h = matrix_size(current)
    if not cols_done:
      for x in range(startidx, w):
        frontier.append((delete_column(current, x), x, False))
      startidx = 0
    for y in range(startidx, h):
      frontier.append((delete_row(current, y), y, True))
  return best_score, best

And here's the output on 280Z28's example matrix:

>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])
说谎友 2024-08-17 11:55:25

由于没有人要求有效的算法,因此请使用强力:生成可以通过从原始矩阵中删除行和/或列来创建的每个可能的矩阵,选择最好的一个。一种稍微更有效的版本(最有可能被证明仍然是正确的)是仅生成那些删除的行和列包含至少一个负值的变体。

Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

行雁书 2024-08-17 11:55:25

以简单的方式进行尝试:

我们需要条目集 {A00, A01, A02, ..., A0n, A10, ...,Ann} 的有效子集,其中最大。和。

首先计算所有子集(幂集)。

有效子集是幂集的成员,对于每两个包含的条目 Aij 和 A(i+x)(j+y),还包含元素 A(i+x)j 和 Ai (j+y)(这是由 Aij 和 A(i+x)(j+y) 跨越的矩形的剩余角)。

Aij ...
 .       .   
 .       .
    ... A(i+x)(j+y)     

这样,您就可以从幂集中消除无效的幂,并找到剩余幂中总和最大的幂。

我确信可以通过改进幂集生成算法以仅生成有效子集并避免步骤 2(调整幂集)来改进它。

To try it in a simple way:

We need the valid subset of the set of entries {A00, A01, A02, ..., A0n, A10, ...,Ann} which max. sum.

First compute all subsets (the power set).

A valid subset is a member of the power set that for each two contained entries Aij and A(i+x)(j+y), contains also the elements A(i+x)j and Ai(j+y) (which are the remaining corners of the rectangle spanned by Aij and A(i+x)(j+y)).

Aij ...
 .       .   
 .       .
    ... A(i+x)(j+y)     

By that you can eliminate the invalid ones from the power set and find the one with the biggest sum in the remaining.

I'm sure it can be improved by improving an algorithm for power set generation in order to generate only valid subsets and by that avoiding step 2 (adjusting the power set).

动听の歌 2024-08-17 11:55:25

我认为某些攻击角度可能会比蛮力有所改善。

  1. 记忆,因为有许多不同的编辑序列将到达相同的子矩阵。
  2. 动态规划。因为矩阵的搜索空间是高度冗余的,所以我的直觉是会有一个 DP 公式可以节省大量重复工作
  3. 我认为有一种启发式方法,但我不能完全确定它:

    如果有一个负数,则可以直接保留矩阵,删除负数所在的列,或者删除其行;我不认为任何其他“举动”会带来更高的金额。对于两个负数,您的选择是:都不删除、删除一个、删除另一个或同时删除两个(其中删除操作是通过取消行或列)。

    现在假设矩阵只有一个正数,其余的都<=0。您显然想要删除除积极条目之外的所有内容。对于只有 2 个正项且其余 <= 0 的矩阵,选项为:不执行任何操作、减少到一个、减少到另一个或减少到两个(生成 1x2、2x1 或 2x2 矩阵) )。

    一般来说,最后一个选项会崩溃(想象一个包含 50 个正数和 50 个负数的矩阵),但根据您的数据(很少负数或很少正数),它可以提供一条捷径。

I think there are some angles of attack that might improve upon brute force.

  1. memoization, since there are many distinct sequences of edits that will arrive at the same submatrix.
  2. dynamic programming. Because the search space of matrices is highly redundant, my intuition is that there would be a DP formulation that can save a lot of repeated work
  3. I think there's a heuristic approach, but I can't quite nail it down:

    if there's one negative number, you can either take the matrix as it is, remove the column of the negative number, or remove its row; I don't think any other "moves" result in a higher sum. For two negative numbers, your options are: remove neither, remove one, remove the other, or remove both (where the act of removal is either by axing the row or the column).

    Now suppose the matrix has only one positive number and the rest are all <=0. You clearly want to remove everything but the positive entry. For a matrix with only 2 positive entries and the rest <= 0, the options are: do nothing, whittle down to one, whittle down to the other, or whittle down to both (resulting in a 1x2, 2x1, or 2x2 matrix).

    In general this last option falls apart (imagine a matrix with 50 positives & 50 negatives), but depending on your data (few negatives or few positives) it could provide a shortcut.

薆情海 2024-08-17 11:55:25
  • 创建一个 n×1 向量 RowSums 和一个 n×1 向量 ColumnSums。将它们初始化为原始矩阵的行和列之和。 O(n²)
  • 如果任何行或列的总和为负,则删除编辑:具有最小值的行或列,并在另一个方向更新总和以反映其新值。 O(n)
  • 当没有行或列的总和小于零时停止。

这是对另一个答案的改进的迭代变体。它在 O(n²) 时间内运行,但对于其他答案中提到的某些情况会失败,这是此问题的复杂性限制(矩阵中有 n² 条目,甚至要找到最小值,您必须检查每个单元格一次) 。

编辑:以下矩阵没有负行或负列,但没有最大化,并且我的算法无法捕获它。

1     1     3        goal     1    3
1   -89   101        ===>     1  101
1   102   -99

下面的矩阵确实有负的行和列,但我的算法选择了错误的行和列进行删除。

 -5     1    -5      goal     1
  1     1     1      ===>     1
-10     2   -10               2

                     mine
                     ===>     1     1     1
  • Create an n-by-1 vector RowSums, and an n-by-1 vector ColumnSums. Initialize them to the row and column sums of the original matrix. O(n²)
  • If any row or column has a negative sum, remove edit: the one with the minimum such and update the sums in the other direction to reflect their new values. O(n)
  • Stop when no row or column has a sum less than zero.

This is an iterative variation improving on another answer. It operates in O(n²) time, but fails for some cases mentioned in other answers, which is the complexity limit for this problem (there are n² entries in the matrix, and to even find the minimum you have to examine each cell once).

Edit: The following matrix has no negative rows or columns, but is also not maximized, and my algorithm doesn't catch it.

1     1     3        goal     1    3
1   -89   101        ===>     1  101
1   102   -99

The following matrix does have negative rows and columns, but my algorithm selects the wrong ones for removal.

 -5     1    -5      goal     1
  1     1     1      ===>     1
-10     2   -10               2

                     mine
                     ===>     1     1     1
烟酉 2024-08-17 11:55:25

计算每行和每列的总和。这可以在 O(m) 内完成(其中 m = n^2)。

当某些行或列的总和为负时,请删除总和小于零的最低行或列。然后重新计算每行/列的总和。

总体思路是,只要有一行或一列的总和为负值,将其删除就会得到更大的整体值。您需要一次删除它们并重新计算,因为在删除这一行/列时,您会影响其他行/列的总和,并且它们可能会或可能不会再有负总和。

这将产生最佳的最大结果。运行时间为 O(mn) 或 O(n^3)

Compute the sum of each row and column. This can be done in O(m) (where m = n^2)

While there are rows or columns that sum to negative remove the row or column that has the lowest sum that is less than zero. Then recompute the sum of each row/column.

The general idea is that as long as there is a row or a column that sums to nevative, removing it will result in a greater overall value. You need to remove them one at a time and recompute because in removing that one row/column you are affecting the sums of the other rows/columns and they may or may not have negative sums any more.

This will produce an optimally maximum result. Runtime is O(mn) or O(n^3)

心清如水 2024-08-17 11:55:25

我无法真正在我的头脑中产生一个算法,但对我来说,如果它作为起点的话,它“闻起来”像动态编程。

I cannot really produce an algorithm on top of my head, but to me it 'smells' like dynamic programming, if it serves as a start point.

情何以堪。 2024-08-17 11:55:25

大编辑:老实说,我认为没有办法评估矩阵并确定它最大化,除非它是完全正的。

也许它需要分支,并探索所有的消除路径。当代价高昂的消除可以在以后带来许多更好的消除时,你永远不会拒绝。如果找到理论最大值,我们可以短路,但除此之外任何算法都必须能够前进和后退。我已经调整了原来的解决方案以通过递归实现此行为。

双重秘密编辑:如果每次迭代不需要找到所有负面元素,它也会在降低复杂性方面取得巨大进步。考虑到它们在调用之间不会发生太大变化,因此将它们的位置传递到下一次迭代更有意义。

获取一个矩阵、矩阵中当前负元素的列表以及初始矩阵的理论最大值。返回矩阵的最大总和以及到达该值所需的移动列表。在我看来,移动列表包含一个移动列表,表示从上一个操作的结果中删除的行/列。

即: r1,r1

将翻译

-1  1  0           1  1  1
-4  1 -4           5  7 1
 1  2  4    ===>  
 5  7  1  
  1. 如果矩阵之和是理论最大值则返回

  2. 查找所有负元素的位置,除非传入空集。

  3. 计算矩阵之和并将其存储在空移动列表旁边。

    • 对于负数每个元素:

    • 计算该元素的行和列的总和。

    • 克隆矩阵并消除该克隆中具有最小总和(行/列)的集合,请注意该操作作为移动列表。

    • 克隆负面元素列表并删除任何受上一步操作影响的元素。

    • 递归调用该算法,提供克隆矩阵、更新的负元素列表和理论最大值。将返回的移动列表附加到生成传递给递归调用的矩阵的操作的移动列表。

    • 如果递归调用的返回值大于存储的总和,则替换它并存储返回的移动列表。

  4. 返回存储的总和和移动列表。

我不确定它比暴力方法更好还是更差,但它现在可以处理所有测试用例。即使最大值包含负值也是如此。

Big Edit: I honestly don't think there's a way to assess a matrix and determine it is maximized, unless it is completely positive.

Maybe it needs to branch, and fathom all elimination paths. You never no when a costly elimination will enable a number of better eliminations later. We can short circuit if it's found the theoretical maximum, but other than any algorithm would have to be able to step forward and back. I've adapted my original solution to achieve this behaviour with recursion.

Double Secret Edit: It would also make great strides to reduce to complexity if each iteration didn't need to find all negative elements. Considering that they don't change much between calls, it makes more sense to just pass their positions to the next iteration.

Takes a matrix, the list of current negative elements in the matrix, and the theoretical maximum of the initial matrix. Returns the matrix's maximum sum and the list of moves required to get there. In my mind move list contains a list of moves denoting the row/column removed from the result of the previous operation.

Ie: r1,r1

Would translate

-1  1  0           1  1  1
-4  1 -4           5  7 1
 1  2  4    ===>  
 5  7  1  
  1. Return if sum of matrix is the theoretical maximum

  2. Find the positions of all negative elements unless an empty set was passed in.

  3. Compute sum of matrix and store it along side an empty move list.

    • For negative each element:

      1. Calculate the sum of that element's row and column.

      2. clone the matrix and eliminate which ever collection has the minimum sum (row/column) from that clone, note that action as a move list.

      3. clone the list of negative elements and remove any that are effected by the action taken in the previous step.

      4. Recursively call this algorithm providing the cloned matrix, the updated negative element list and the theoretical maximum. Append the moves list returned to the move list for the action that produced the matrix passed to the recursive call.

      5. If the returned value of the recursive call is greater than the stored sum, replace it and store the returned move list.

  4. Return the stored sum and move list.

I'm not sure if it's better or worse than the brute force method, but it handles all the test cases now. Even those where the maximum contains negative values.

2024-08-17 11:55:25

这是一个优化问题,可以通过基于模拟的迭代算法来近似解决退火

符号:C是列数。

对于 J 迭代:

  1. 查看每一列并计算切换它的绝对收益(如果当前处于打开状态,则将其关闭,如果当前处于关闭状态,则将其打开)。这给你 C 值,例如 -3, 1, 4。贪婪的确定性解决方案只会选择最后一个操作(切换最后一列以获得 4 的好处),因为它局部改善了目标。但这可能会让我们陷入局部最优。相反,我们概率性地选择三个操作之一,概率与收益成正比。为此,请将它们通过 Sigmoid 函数 并标准化,将其转换为概率分布。 (或者使用 exp() 而不是 sigmoid()?)因此,对于 -3、1、4,您可以从 sigmoid 中得到 0.05、0.73、0.98,在标准化后得到 0.03、0.42、0.56。现在根据概率分布选择操作,例如以概率 0.56 切换最后一列,以概率 0.42 切换第二列,或以微小概率 0.03 切换第一列。

  2. 对行执行相同的过程,导致切换其中一行。

迭代 J 次迭代直至收敛。

我们还可以在早期迭代中使每个概率分布更加均匀,这样我们就不会在早期陷入错误的决策。因此,我们将非归一化概率提高到 1/T 次幂,其中 T 在早期迭代中很高,然后慢慢减小,直到接近 0。例如,从上面的 0.05、0.73、0.98,提高到 1/10 结果为 0.74 , 0.97, 1.0,归一化后为 0.27, 0.36, 0.37(因此比原来的 0.05, 0.73, 0.98 更加均匀)。

This is an optimization problem and can be solved approximately by an iterative algorithm based on simulated annealing:

Notation: C is number of columns.

For J iterations:

  1. Look at each column and compute the absolute benefit of toggling it (turn it off if it's currently on or turn it on if it's currently off). That gives you C values, e.g. -3, 1, 4. A greedy deterministic solution would just pick the last action (toggle the last column to get a benefit of 4) because it locally improves the objective. But that might lock us into a local optimum. Instead, we probabilistically pick one of the three actions, with probabilities proportional to the benefits. To do this, transform them into a probability distribution by putting them through a Sigmoid function and normalizing. (Or use exp() instead of sigmoid()?) So for -3, 1, 4 you get 0.05, 0.73, 0.98 from the sigmoid and 0.03, 0.42, 0.56 after normalizing. Now pick the action according to the probability distribution, e.g. toggle the last column with probability 0.56, toggle the second column with probability 0.42, or toggle the first column with the tiny probability 0.03.

  2. Do the same procedure for the rows, resulting in toggling one of the rows.

Iterate for J iterations until convergence.

We may also, in early iterations, make each of these probability distributions more uniform, so that we don't get locked into bad decisions early on. So we'd raise the unnormalized probabilities to a power 1/T, where T is high in early iterations and is slowly decreased until it approaches 0. For example, 0.05, 0.73, 0.98 from above, raised to 1/10 results in 0.74, 0.97, 1.0, which after normalization is 0.27, 0.36, 0.37 (so it's much more uniform than the original 0.05, 0.73, 0.98).

望笑 2024-08-17 11:55:25

它显然是 NP 完全的(如上所述)。鉴于此,如果我必须针对该问题提出最佳算法:

  • 尝试二次整数规划的一些迭代,将问题表述为:SUM_ij a_ij x_i y_j,其中 x_i 和 y_j变量限制为 0 或 1。对于某些矩阵,我认为这会很快找到解决方案,对于最困难的情况,它不会比蛮力更好(而且不会太多)。

  • 并行(并使用大部分 CPU),使用近似搜索算法生成越来越好的解决方案。另一个答案中建议模拟退火,但在对类似的组合优化问题进行过研究后,我的经验是禁忌搜索会更快地找到好的解决方案。如果您使用增量更新单个更改的成本的技巧,那么就在最短的时间内在不同的“可能更好”的解决方案之间徘徊而言,这可能接近最佳(请参阅我的论文“图统治,禁忌搜索和足球池问题”) ”)。

  • 使用上面第二个到目前为止的最佳解决方案,通过避免搜索下限比它更差的可能性来引导第一个解决方案。

    使用上面

显然这并不能保证找到最大解。但是,当可行时通常会这样做,否则它将提供非常好的局部最大解决方案。如果有人有需要这种优化的实际情况,我认为这是最有效的解决方案。

停止识别问题可能是 NP 完全问题在工作面试中看起来并不好! (除非这项工作是复杂性理论的,但即使如此我也不会。)你需要提出好的方法 - 这就是这样的问题的要点。看看你在压力下能想出什么办法,因为现实世界常常需要处理这样的事情。

It's clearly NP-Complete (as outlined above). Given this, if I had to propose the best algorithm I could for the problem:

  • Try some iterations of quadratic integer programming, formulating the problem as: SUM_ij a_ij x_i y_j, with the x_i and y_j variables constrained to be either 0 or 1. For some matrices I think this will find a solution quickly, for the hardest cases it would be no better than brute force (and not much would be).

  • In parallel (and using most of the CPU), use a approximate search algorithm to generate increasingly better solutions. Simulating Annealing was suggested in another answer, but having done research on similar combinatorial optimisation problems, my experience is that tabu search would find good solutions faster. This is probably close to optimal in terms of wandering between distinct "potentially better" solutions in the shortest time, if you use the trick of incrementally updating the costs of single changes (see my paper "Graph domination, tabu search and the football pool problem").

  • Use the best solution so far from the second above to steer the first by avoiding searching possibilities that have lower bounds worse than it.

Obviously this isn't guaranteed to find the maximal solution. But, it generally would when this is feasible, and it would provide a very good locally maximal solution otherwise. If someone had a practical situation requiring such optimisation, this is the solution that I'd think would work best.

Stopping at identifying that a problem is likely to be NP-Complete will not look good in a job interview! (Unless the job is in complexity theory, but even then I wouldn't.) You need to suggest good approaches - that is the point of a question like this. To see what you can come up with under pressure, because the real world often requires tackling such things.

谜泪 2024-08-17 11:55:25

是的,这是一个NP完全问题。

很难轻易地找到最好的子矩阵,但是我们可以很容易地找到一些更好的子矩阵。

假设我们在矩阵中给出 m 个随机点作为“feeds”。然后让它们按照以下规则自动扩展:

如果向 feed-matrix 添加一个新行或新列,请确保总和会递增。

,然后我们可以比较m个子矩阵来找到最好的一个。

yes, it's NP-complete problem.

It's hard to easily find the best sub-matrix,but we can easily to find some better sub-matrix.

Assume that we give m random points in the matrix as "feeds". then let them to automatically extend by the rules like :

if add one new row or column to the feed-matrix, ensure that the sum will be incrementive.

,then we can compare m sub-matrix to find the best one.

南七夏 2024-08-17 11:55:25

假设 n = 10。

蛮力(所有可能的行集 x 所有可能的列集)需要

2^10 * 2^10 =~ 1,000,000 个节点。

我的第一种方法是将其视为树搜索,并使用

正条目的总和是子树中每个节点的上限

作为修剪方法。结合贪婪算法以廉价地生成良好的初始边界,平均在大约 80,000 个节点中产生答案。


但还有更好的方法!我后来意识到

修复了行X的一些选择。
计算出这组行的最佳列现在很简单(如果 X 行中的条目总和为正,则保留一列,否则丢弃它)。

因此,我们可以对所有可能的选择进行暴力破解行;这需要 2^10 = 1024 个节点。

添加剪枝方法后,节点数量平均减少到 600 个。

在遍历行集树时保留“列和”并增量更新它们应该允许每个节点的计算(矩阵之和等)为 O(n) 而不是 O(n^2)。总复杂度为 O(n * 2^n)

Let's say n = 10.

Brute force (all possible sets of rows x all possible sets of columns) takes

2^10 * 2^10 =~ 1,000,000 nodes.

My first approach was to consider this a tree search, and use

the sum of positive entries is an upper bound for every node in the subtree

as a pruning method. Combined with a greedy algorithm to cheaply generate good initial bounds, this yielded answers in about 80,000 nodes on average.


but there is a better way ! i later realised that

Fix some choice of rows X.
Working out the optimal columns for this set of rows is now trivial (keep a column if its sum of its entries in the rows X is positive, otherwise discard it).

So we can just brute force over all possible choices of rows; this takes 2^10 = 1024 nodes.

Adding the pruning method brought this down to 600 nodes on average.

Keeping 'column-sums' and incrementally updating them when traversing the tree of row-sets should allow the calculations (sum of matrix etc) at each node to be O(n) instead of O(n^2). Giving a total complexity of O(n * 2^n)

习惯那些不曾习惯的习惯 2024-08-17 11:55:25

对于稍微不太理想的解决方案,我认为这是一个 PTIME、PSPACE 复杂性问题。

GREEDY 算法可以按如下方式运行:


将矩阵加载到内存中并计算行总计。之后运行主循环,

1) 删除最小的行,

2) 从旧行总计中减去新省略的值

-->当不再有负数行时中断。


第二点是一个微妙的细节:减去两行/列的时间复杂度为n

重新求和除两列之外的所有列时,具有n^2时间复杂度

For slightly less than optimal solution, I think this is a PTIME, PSPACE complexity issue.

The GREEDY algorithm could run as follows:


Load the matrix into memory and compute row totals. After that run the main loop,

1) Delete the smallest row,

2) Subtract the newly omitted values from the old row totals

--> Break when there are no more negative rows.


Point two is a subtle detail: subtracted two rows/columns has time complexity n.

While re-summing all but two columns has n^2 time complexity!

吾性傲以野 2024-08-17 11:55:25

获取每一行和每一列并计算总和。对于 2x2 矩阵,这将是:

2    1

3    -10

Row(0) = 3
行(1) = -7
列(0) = 5
Col(1) = -9

组成一个新矩阵

Cost to take row      Cost to take column
       3                      5

      -7                      -9

取出您需要的任何内容,然后重新开始。

您只需在新矩阵上查找负值即可。这些是实际从整个矩阵值中减去的值。当没有更多的负“SUMS”值可以取出时它终止(因此所有列和行对最终结果进行求和)

在一个 nxn 矩阵中,我认为这将是 O(n^2)Log(n)

Take each row and each column and compute the sum. For a 2x2 matrix this will be:

2    1

3    -10

Row(0) = 3
Row(1) = -7
Col(0) = 5
Col(1) = -9

Compose a new matrix

Cost to take row      Cost to take column
       3                      5

      -7                      -9

Take out whatever you need to, then start again.

You just look for negative values on the new matrix. Those are values that actually substract from the overall matrix value. It terminates when there're no more negative "SUMS" values to take out (therefore all columns and rows SUM something to the final result)

In an nxn matrix that would be O(n^2)Log(n) I think

美煞众生 2024-08-17 11:55:25
function pruneMatrix(matrix) {
  max = -inf;
  bestRowBitField = null;
  bestColBitField = null;
  for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
    for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
      sum = calcSum(matrix, rowBitField, colBitField);
      if (sum > max) {
        max = sum;
        bestRowBitField = rowBitField;
        bestColBitField = colBitField;
      }
    }
  }
  return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}

function calcSumForCombination(matrix, rowBitField, colBitField) {
  sum = 0;
  for(i=0; i<matrix.height; i++) {
    for(j=0; j<matrix.width; j++) {
      if (rowBitField & 1<<i && colBitField & 1<<j) {
        sum += matrix[i][j];
      }
    }
  }
  return sum;
}
function pruneMatrix(matrix) {
  max = -inf;
  bestRowBitField = null;
  bestColBitField = null;
  for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
    for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
      sum = calcSum(matrix, rowBitField, colBitField);
      if (sum > max) {
        max = sum;
        bestRowBitField = rowBitField;
        bestColBitField = colBitField;
      }
    }
  }
  return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}

function calcSumForCombination(matrix, rowBitField, colBitField) {
  sum = 0;
  for(i=0; i<matrix.height; i++) {
    for(j=0; j<matrix.width; j++) {
      if (rowBitField & 1<<i && colBitField & 1<<j) {
        sum += matrix[i][j];
      }
    }
  }
  return sum;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文