优化问题——寻找最大值

发布于 2024-09-10 16:25:56 字数 155 浏览 8 评论 0原文

我手头有一个问题,可以简化为这样:

假设二维平面 XY 中有一堆随机点,其中对于每个 Y,X 上可能有多个点,对于每个 X,可能有多个点在 Y 上。

每当选择一个点 (Xi,Yi) 时,就不能选择其他 X = Xi OR Y = Yi 的点。我们必须选择最大点数。

I have a problem at hand which can be reduced to something like this :

Assume a bunch of random points in a two-dimension plane X-Y where for each Y, there could be multiple points on X and for each X, there could be multiple points on Y.

Whenever a point (Xi,Yi) is chosen, no other point with X = Xi OR Y = Yi can be chosen. We have to choose the maximum number of points.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

凉城 2024-09-17 16:25:57

根据 IVlad 的建议,我研究了 Hopcroft–Karp 算法。对于这个问题,它通常比最大流算法和匈牙利算法都要好,而且通常是显着的。一些比较:

一般

  • 最大流量:O(V3),其中V是顶点数。
  • 匈牙利语:O(n3),其中 n 是矩阵中的行数
  • Hopcroft-Karp:O(V √2V) 其中 V是顶点数。

对于 50×50 矩阵,选择 50% 的顶点

  • 最大流量:1,2503 = 1,953,125,000
  • 匈牙利语:503 = 125,000
  • Hopcroft-Karp : 1,250 √2,500 = 62,500

对于 1000×1000 矩阵,选择 10 个顶点

  • 最大流量:103 = 1,000
  • 匈牙利语:10003 = 1,000,000,000
  • Hopcroft-Karp:10 √20 ≅ 44.7

匈牙利算法唯一更好的时候是当选择的点比例非常高时。

对于 100×100 矩阵,选择了 90% 的顶点

  • 最大流量:9,0003 = 729,000,000,000
  • 匈牙利语:1003 = 1,000,000
  • Hopcroft-Karp : 9,000 √18,000 ≅ 1,207,476.7

最大流量算法从未更好。

实际上,这也很简单。此代码使用 David Eppstein 的实现:

points = {
    0 : [0, 1],
    1 : [0],
    2 : [1, 2],
}

selected = bipartiteMatch(points)[0]

for x, y in selected.iteritems():
    print '(%i, %i)' % (x, y)

print 'Total: %i' % len(selected)

Based on a recommendation from IVlad, I looked into the Hopcroft–Karp algorithm. It's generally better than both the maximum flow algorithm and the Hungarian algorithm for this problem, often significantly. Some comparisons:

In general:

  • Max Flow: O(V3) where V is the number of vertices.
  • Hungarian: O(n3) where n is the number of rows in the matrix
  • Hopcroft-Karp: O(V √2V) where V is the number of vertices.

For a 50×50 matrix, with 50% chosen vertices:

  • Max Flow: 1,2503 = 1,953,125,000
  • Hungarian: 503 = 125,000
  • Hopcroft-Karp: 1,250 √2,500 = 62,500

For a 1000×1000 matrix, with 10 chosen vertices:

  • Max Flow: 103 = 1,000
  • Hungarian: 10003 = 1,000,000,000
  • Hopcroft-Karp: 10 √20 ≅ 44.7

The only time the Hungarian algorithm is better is when there is a significantly high proportion of points selected.

For a 100×100 matrix, with 90% chosen vertices:

  • Max Flow: 9,0003 = 729,000,000,000
  • Hungarian: 1003 = 1,000,000
  • Hopcroft-Karp: 9,000 √18,000 ≅ 1,207,476.7

The Max Flow algorithm is never better.

It's also quite simple, in practice. This code uses an implementation by David Eppstein:

points = {
    0 : [0, 1],
    1 : [0],
    2 : [1, 2],
}

selected = bipartiteMatch(points)[0]

for x, y in selected.iteritems():
    print '(%i, %i)' % (x, y)

print 'Total: %i' % len(selected)
仙女山的月亮 2024-09-17 16:25:56

这可以简化为简单的最大流量问题。如果你有一个点 (xi, yi),在图中它应该用从源 S 到点 xi、从 xi 到 yi 以及从 yi 到最后一个节点(汇点)T 的路径表示。

注意,如果我们有点 (2 , 2) 和 (2, 5),从 S 到 x2 仍然只有一条路径。所有路径(边)的容量都是1。

这个网络中的流就是答案。

关于一般问题
http://en.wikipedia.org/wiki/Max_flow

更新
我现在没有图形编辑器来可视化问题,但您可以轻松地手工绘制示例。假设点是 (3, 3) (3, 5) (2, 5)

那么边(路径)将是
S-> x2,S-> x3
y3-> T,y5→ T
x3-> y3,x3-> y5,x2-> y5

流量:S→ x2-> y5-> T和S-> x3-> y3-> T
从源头到水槽的“水”量是 2,答案也是如此。

还有关于最大流算法的教程
http://www.topcoder.com/tc?module=静态&d1=教程&d2=maxFlow

This can be reduced to simple maximum flow problem. If you have a point (xi, yi), in graph it should be represented with path from source S to point xi, from xi to yi and from yi to the last node (sink) T.

Note, if we have points (2, 2) and (2, 5), there's still only one path from S to x2. All paths (edges) have capacity 1.

The flow in this network is the answer.

about general problem
http://en.wikipedia.org/wiki/Max_flow

update
I don't have graphic editor right now to visualise problem, but you can easily draw example by hand. Let's say, points are (3, 3) (3, 5) (2, 5)

Then edges (paths) would be
S -> x2, S -> x3
y3 -> T, y5 -> T
x3 -> y3, x3 -> y5, x2 -> y5

Flow: S -> x2 -> y5 -> T and S -> x3 -> y3 -> T
The amount of 'water' going from source to sink is 2 and so is the answer.

Also there's a tutorial about max flow algorithms
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=maxFlow

寻找一个思念的角度 2024-09-17 16:25:56

这不就是匈牙利算法吗?

创建一个 n×n 矩阵,其中标记顶点为 0,未标记顶点为 1。该算法将选择 n 个顶点,每一行和每一列一个,这会最小化它们的总和。只需计算所有等于 0 的所选顶点,即可得到答案。

from munkres import Munkres

matrix = [[0, 0, 1],
          [0, 1, 1],
          [1, 0, 0]]

m = Munkres()
total = 0
for row, column in m.compute(matrix):
    if matrix[row][column] == 0:
        print '(%i, %i)' % (row, column)
        total += 1

print 'Total: %i' % total

其运行时间为 O(n3),其中 n 是矩阵中的行数。最大流解的运行时间为 O(V3),其中 V 是顶点数。只要选择的交叉点超过 n 个,运行速度就会更快;事实上,随着所选顶点数量的增加,它的运行速度会快几个数量级。

Isn't this just the Hungarian algorithm?

Create an n×n matrix, with 0 at marked vertices, and 1 at unmarked vertices. The algorithm will choose n vertices, one for each row and column, which minimizes their sum. Simply count all the chosen vertices which equal 0, and you have your answer.

from munkres import Munkres

matrix = [[0, 0, 1],
          [0, 1, 1],
          [1, 0, 0]]

m = Munkres()
total = 0
for row, column in m.compute(matrix):
    if matrix[row][column] == 0:
        print '(%i, %i)' % (row, column)
        total += 1

print 'Total: %i' % total

This runs in O(n3) time, where n is the number of rows in the matrix. The maximum flow solution runs in O(V3), where V is the number of vertices. As long as there are more than n chosen intersections, this runs faster; in fact, it runs orders of magnitude faster, as the number of chosen vertices goes up.

幸福%小乖 2024-09-17 16:25:56

不同的解决方案。事实证明,有很多对称性,答案比我最初想象的要简单得多。您可以做的最大数量是唯一 X 和唯一 Y 中的最小值,如果您只想要结果,则为 O(NlogN)。

所有其他形状都相当于包含点的矩形,因为无论您从矩形中心拉出多少个点,顺序都无关紧要(如果按如下方式处理)。 任何形状,您从现在开始选取一个点,就会少一个唯一的 X 和一个不唯一的 Y,就像矩形一样。

所以最优解与连通性无关。选择最小尺寸边缘上的任意点(即,如果 len(unique-Xs)>len(unique-Ys),则选择具有最大或最小 X 的任何点)。它有多少个连接并不重要,重要的是哪个维度最大,这可以在查看上面创建的排序的唯一列表时轻松完成。如果保留 unique-x 和 unique-y 计数器,并在删除列表中该元素中的所有唯一节点时递减它们,则每次删除的时间复杂度为 O(1),重新计算长度的时间复杂度为 O(1)。因此,重复这 N 次最坏的情况是 O(N),最终的复杂度是 O(NlogN)(仅由于排序)。

您可以沿着最短的边缘选择任何点,因为:

  • 如果该边缘上只有一个点,您最好现在选择它,否则
  • 如果该边缘上有多个点,其他东西会消除它,谁在乎呢,你无论如何,都会用你的选择消除所有这些

基本上,你在每个点都最大化“max(uniqX,uniqY)”。

更新:IVlad 捕获了边缘情况:

如果尺寸相等,则取点最少的边缘。即使它们不相等,也可以从您要消除的唯一堆栈中取出分数最少的顶部或底部。

例证:

第 1 回合:

  • 点:(1, 2); (3, 5); (10, 5); (10, 2); (10, 3)
  • 有 3 个唯一的 X:1, 3, 10
  • 有 3 个唯一的 Y:2, 3, 5
  • “边界框”是 (1,5),(10,5),(10,2),(1,2)

反应 1:

  • “外边缘”(最外面的 uniqueX 或 uniqueY 点列表)最少的点是左边。基本上,查看 x=1,x=10 和 y=2,y=5 中的点集。 x=1 的集合是最小的:一个点。选择 x=1 -> 的唯一点<代码>(1,2)。
  • 这也消除了 (10,2)

第 2 回合:

  • 点数:(3, 5); (10, 5); (10, 3)
  • 有 2 个唯一的 X:3, 10
  • 有 2 个唯一的 Y:3, 5
  • “边界框”是 (3,5),(10,5),(10,3),(3,3)

反应2:

  • 具有最少的边界框的“边缘”要么是底部,要么是左侧。我们遇到了简单的情况 - 4 点意味着所有边缘都是外边缘。消除一个。说(10,3)
  • 这也消除了 (10,5)

第 3 回合:

  • 点:(3, 5)

反应 3:

  • 删除(3,5)

Different solution. It turns out that there's a lot of symmetry, and the answer is a lot simpler than I originally thought. The maximum number of things you can ever do is the minimum of the unique X's and unique Y's, which is O(NlogN) if you just want the result.

Every other shape is equivalent to a rectangle that contains the points, because it doesn't matter how many points you pull from the center of a rectangle, the order will never matter (if handled as below). Any shape that you pluck a point from now has one less unique X and one less unique Y, just like a rectangle.

So the optimal solution has nothing to do with connectedness. Pick any point that is on the edge of the smallest dimension (i.e. if len(unique-Xs)>len(unique-Ys), pick anything that has either maximum or minimum X). It doesn't matter how many connections it has, just which dimension is biggest, which can easily be done while looking at the sorted-unique lists created above. If you keep a unique-x and unique-y counter and decrement them when you delete all the unique nodes in that element of the list, then each deletion is O(1) and recalculating the lengths is O(1). So repeating this N times is at worst O(N), and the final complexity is O(NlogN) (due solely to the sorting).

You can pick any point along the shortest edge because:

  • if there's only one on that edge, you better pick it now or something else will eliminate it
  • if there's more than one on that edge, who cares, you will eliminate all of them with your pick anyways

Basically, you're maximizing "max(uniqX,uniqY)" at each point.

Update: IVlad caught an edge case:

If the dimensions are equal, take the edge with the least points. Even if they aren't equal, take the top or bottom of the unique-stack you're eliminating from that has the least points.

Case in point:

Turn 1:

  • Points: (1, 2); (3, 5); (10, 5); (10, 2); (10, 3)
  • There are 3 unique X's: 1, 3, 10
  • There are 3 unique Y's: 2, 3, 5
  • The "bounding box" is (1,5),(10,5),(10,2),(1,2)

Reaction 1:

  • The "outer edge" (outermost uniqueX or uniqueY lists of points) that has the least points is the left. Basically, look at the sets of points in x=1,x=10 and y=2,y=5. The set for x=1 is the smallest: one point. Pick the only point for x=1 -> (1,2).
  • That also eliminates (10,2).

Turn 2:

  • Points: (3, 5); (10, 5); (10, 3)
  • There are 2 unique X's: 3, 10
  • There are 2 unique Y's: 3, 5
  • The "bounding box" is (3,5),(10,5),(10,3),(3,3)

Reaction 2:

  • The "edge" of the bounding box that has the least is either the bottom or the left. We reached the trivial case - 4 points means all edges are the outer edges. Eliminate one. Say (10,3).
  • That also eliminates (10,5).

Turn 3:

  • Points: (3, 5)

Reaction 3:

  • Remove (3,5).
路弥 2024-09-17 16:25:56

对于每个点,确定因选择该点而被取消资格的其他点的数量 (N)(即具有相同 X 或 Y 值的点)。然后,按照 N 个不合格点的数量递增的顺序迭代非不合格点。完成后,您将消除最大数量的分数。

For each point, identify the number of other points (N) that would be disqualified by the selection of that point (i.e. the ones with the same X or Y values). Then, iterate over the non-disqualified points in order of increasing number of N disqualified points. When you are finished, you will have removed the maximum number of points.

扬花落满肩 2024-09-17 16:25:56

XY 平面是一个转移注意力的平面。将其表述为一组元素,每个元素都有一组互斥的元素。

然后该算法变成深度优先搜索。在每个级别,对于每个候选节点,计算排除元素的集合,即当前排除的元素与候选节点排除的元素的并集。按照排除元素最少到最多的顺序尝试候选节点。跟踪迄今为止的最佳解决方案(最少的排除节点)。修剪任何比当前最好的子树更差的子树。

作为以可能丢失解决方案为代价的轻微改进,您可以使用布隆过滤器来跟踪排除的集合。

The XY plane is a red herring. Phrase it as a set of elements, each of which has a set of mutually exclusive elements.

The algorithm then becomes a depth-first search. At each level, for each candidate node, calculate the set of excluded elements, the union of currently excluded elements with the elements excluded by the candidate node. Try candidate nodes in order of fewest excluded elements to most. Keep track of the best solution so far (the fewest excluded nodes). Prune any subtrees that are worse than the current best.

As a slight improvement at the cost of possible missed solutions, you can use Bloom filters for keeping track of the excluded sets.

帅气尐潴 2024-09-17 16:25:56

这看起来像是一个可以通过动态编程解决的问题。研究最长公共子串的算法,或背包问题

This looks like a problem that can be solved with dynamic programming. Look into the algorithms for longest common substring, or the knapsack problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文