当前位置：文江博客话题详情

如何解决这个线性规划问题？

发布于 2024-10-29 02:19:30 字数 670 浏览 2 评论 0 原文

我不太擅长线性编程，所以我在这里发布这个问题。希望有人能指出我正确的方向。这不是作业问题，所以不要误会。

我有一个 5x5 矩阵（25 个节点）。每个节点与其相邻节点（或邻居节点）之间的距离为1个单位。节点可以处于以下两种情况之一：缓存或访问。如果节点'i'是缓存节点，则访问节点'j'可以以Dij x Aij的成本访问它> （访问成本）。 Dij 是节点 i 和 j 之间的曼哈顿距离。 Aij 是节点 i 到 j 的访问频率。

为了成为缓存节点 i，它需要从现有缓存节点 k 进行缓存，成本为 Dik x C，其中 C 是整数常量。 （缓存成本）。 C称为缓存频率。

A 以 25x25 矩阵的形式提供，其中包含显示任意一对节点 i 和 j 之间的访问频率的所有整数。 D 以 25x25 矩阵的形式提供，其中包含任意一对节点 i 和 j 之间的所有曼哈顿距离。

假设矩阵中有1个缓存节点，找出其他缓存节点和访问节点的集合，使得总成本最小。 总成本 = 总缓存成本 + 总访问成本。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

無處可尋 2024-11-05 02:19:30

我已经解决了一些类似这样的问题。

首先，如果您不需要确切的答案，我通常建议研究遗传算法或贪婪算法之类的东西。它不会是正确的，但通常也不会是坏的。而且它会比精确算法快得多。例如，您可以从所有点作为缓存点开始，然后找到最能降低成本的点，使其成为非缓存点。继续下去，直到删除下一个会使成本上升，然后将其用作您的解决方案。这不会是最好的。一般来说，它会相当不错。

如果您确实需要确切的答案，则需要对大量数据进行强力搜索。假设指定了初始缓存点，您将有 2²⁴ = 16,777,216 组可能的缓存点可供搜索。那很贵。

更便宜地做到这一点的技巧（注意，不是便宜，只是更便宜）是找到修剪搜索的方法。请牢记这样一个事实：如果对您查看的每个集合执行 100 倍的工作量，您可以平均删除 10 个点作为缓存点，那么您的整体算法将访问 0.1% 的集合，并且您的代码将运行快 10 倍。因此，即使修剪步骤相当昂贵，还是值得尽早且经常地投入大量精力进行修剪。

通常您需要多种修剪策略。其中之一通常是“我们在这里能做的最好的事情比我们以前发现的最好的事情还要糟糕。”如果您已经找到了一个非常好的最佳解决方案，那么效果会更好。因此，在寻找解决方案时，通常值得花一些精力进行一些本地优化。

通常，这些优化不会改变您正在做大量工作的事实。但它们确实可以让您减少几个数量级的工作。

我对此的初步尝试将利用以下观察结果。

假设 x 是缓存点，y 是其最近的缓存邻居。然后，如果您只是将缓存更新流量从 x 路由到 x 到 y 缓存的路径>y 沿着那条路。因此，不失一般性，缓存点组在网格上连接。
如果最低成本最终可能超过我们目前发现的最佳成本，那么我们就不会走上全球解决方案的道路。
一旦与缓存点距离大于 1 的所有点的访问速率加上邻居对您仍然可以使用的缓存点的最高访问频率之和小于缓存频率，则添加更多缓存点：总会有损失。（这将是一个“昂贵的条件，让我们提前 10 分钟停止。”）
当前缓存点集的最高访问邻居是下一个要尝试的缓存点的合理候选者。（您还可以尝试其他几种启发式方法，但这个是合理的。）
总访问频率超过缓存频率的任何点绝对一定是缓存点。

这可能不是最好的观察结果集。但这可能是相当合理的。要利用这一点，您至少需要一种您可能不熟悉的数据结构。如果您不知道优先级队列是什么，那么请在您的应用程序中寻找一个高效的队列。选择的语言。如果您找不到，堆非常容易实现并且可以工作作为优先级队列非常好。

考虑到这一点，假设您已获得所描述的信息和初始缓存节点 P，下面是找到最佳节点的算法的伪代码。

# Data structures to be dynamically maintained:
#  AT[x, n] - how many accesses x needs that currently need to go distance n.
#  D[x] - The distance from x to the nearest cache node.
#  CA[x] - Boolean yes/no for whether x is a cache node.
#  B[x] - Boolean yes/no for whether x is blocked from being a cache node.
#  cost - Current cost
#  distant_accesses - The sum of the total number of accesses made from more than
#      distance 1 from the cache nodes.
#  best_possible_cost - C * nodes_in_cache + sum(min(total accesses, C) for non-cache nodes)
#  *** Sufficient data structures to be able to unwind changes to all of the above before
#      returning from recursive calls (I won't specify accesses to them, but they need to
#      be there)
#  best_cost - The best cost found.
#  best_solution - The best solution found.

initialize all of those data structures (including best)
create neighbors priority queue of neighbors of root cache node (ordered by accesses)
call extend_current_solution(neighbors)
do what we want with the best solution

function extend_current_solution (available_neighbors):
    if cost < best_cost:
        best_cost = cost
        best_solution = CA # current set of cache nodes.
    if best_cost < best_possible_cost
        return # pruning time
    neighbors = clone(available_neighbors)
    while neighbors:
        node = remove best from neighbors
        if distant_accesses + accesses(node) < C:
            return # this is condition 3 above
        make node in cache set
            - add it to CA
            - update costs
            - add its immediate neighbors to neighbors
        call extend_current_solution
        unwind changes just made
        make node in blocked set
        call extend_current_solution
    unwind changes to blocked set
    return

编写此代码需要花费大量工作，并且您需要小心维护所有数据结构。但我敢打赌，尽管它看起来有多么重量级，您会发现它会充分修剪您的搜索空间，从而比您现有的解决方案运行得更快。（它仍然不会很快。）

祝你好运！

更新

当我更多地思考这个问题时，我意识到更好的观察是注意，如果你可以将“不是缓存节点，不是阻塞节点”集合切成两块，那么你可以独立地解决这些块。每个子问题的解决速度都比整个问题快几个数量级，因此要尽可能快地解决。

一个很好的启发式方法是遵循以下规则：

当没有到达边缘时：
1. 驶向最近的边缘。距离是通过沿着非缓存、非阻塞集合的最短路径有多短来衡量的。
2. 如果两条边等距，则根据以下优先顺序打破平局：(1, x), (x, 1), (5, x), (x, 5)。< /里>
3. 根据偏好向边缘中心行驶来打破任何剩余的联系。
4. 随机打破所有剩余的联系。
虽然已经到达边缘并且您的组件仍然具有可能成为缓存片段的边缘：
1. 如果您可以立即进入边缘并将边缘部分分成两个部分，请这样做。对于“缓存集中的边”和“缓存集中的边”，您都会得到 2 个更容易处理的独立子问题。
2. 否则，沿着最短路径向边缘部分中间的部分移动。
3. 如果存在平局，则打破平局，以支持从添加的片段到添加的缓存元素的线尽可能接近对角线的任何方式。
4. 随机打破所有剩余的联系。
如果你跌倒在这里，请随机选择。（此时您应该有一个非常小的子问题。无需聪明。）

如果您尝试从 (3, 3) 作为缓存点开始，您会发现在在最初的几个决定中，您会发现 7/16 的时间您设法切入两个偶数问题，1/16 的时间您在缓存点中阻塞并完成，1/4 的时间您设法切出一个2x2 块变成一个单独的问题（使该部分的整体解决方案运行速度快 16 倍），并且有 1/4 的时间您最终会顺利找到解决方案，该解决方案要么被限制（并很快耗尽），要么成为一个具有大量缓存点的解决方案的候选者，这些缓存点因可能成为糟糕的解决方案而被修剪。

我不会给出这个变体的伪代码。它与我上面的内容有很多相似之处，并且有许多重要的细节需要处理。但我愿意打赌，在实践中它的运行速度会比原来的解决方案快几个数量级。

I've tackled a few problems that are something like this.

First, if you don't need an exact answer, I'd generally suggest looking into something like a genetic algorithm, or doing a greedy algorithm. It won't be right, but it won't generally be bad either. And it will be much faster than an exact algorithm. For instance you can start with all points as cache points, then find the point which reduces your cost most from making it a non-caching point. Continue until removing the next one makes the cost goes up, and use that as your solution. This won't be best. It will generally be reasonably good.

If you do need an exact answer, you will need to brute force search of a lot of data. Assuming that the initial cache point is specified, you'll have 2²⁴ = 16,777,216 possible sets of cache points to search. That is expensive.

The trick to doing it more cheaply (note, not cheaply, just more cheaply) is finding ways to prune your search. Take to heart the fact that if doing 100 times as much work on each set you look at lets you remove an average of 10 points from consideration as cache points, then your overall algorithm will visit 0.1% as many sets, and your code will run 10 times faster. Therefore it is worth putting a surprising amount of energy into pruning early and often, even if the pruning step is fairly expensive.

Often you want multiple pruning strategies. One of them is usually "the best we can do from here is worst than the best we have found previously." This works better if you've already found a pretty good best solution. Therefore it is often worth a bit of effort to do some local optimization in your search for solutions.

Typically these optimizations won't change the fact that you are doing a tremendous amount of work. But they do let you do orders of magnitude less work.

My initial try at this would take advantage of the following observations.

Suppose that x is a cache point, and y is its nearest caching neighbor. Then you can always make some path from x to y cache "for free" if you just route the cache update traffic from x to y along that path. Therefore without loss of generality the set of cache points is connected on the grid.
If the minimum cost would could wind up with exceeds the current best cost we have found, we are not on our way to a global solution.
As soon as the sum of the access rate from all points at distance greater than 1 from the cache points plus the highest access frequency of a neighbor to the cache point that you can still use is less than the cache frequency, adding more cache points is always going to be a loss. (This would be an "expensive condition that lets us stop 10 minutes early.")
The highest access neighbor of the current set of cache points is a reasonable candidate for the next cache point to try. (There are several other heuristics you can try, but this one is reasonable.)
Any point whose total access frequency exceeds the cache frequency absolutely must be a caching point.

This might not be the best set of observations to use. But it is likely to be pretty reasonable. To take advantage of this you'll need at least one data structure you might not be familiar with. If you don't know what a priority queue is, then look around for an efficient one in your language of choice. If you can't find one, a heap is pretty easy to implement and works pretty well as a priority queue.

With that in mind, assuming that you have been given the information you've described and an initial cache node P, here is pseudo-code for an algorithm to find the best.

# Data structures to be dynamically maintained:
#  AT[x, n] - how many accesses x needs that currently need to go distance n.
#  D[x] - The distance from x to the nearest cache node.
#  CA[x] - Boolean yes/no for whether x is a cache node.
#  B[x] - Boolean yes/no for whether x is blocked from being a cache node.
#  cost - Current cost
#  distant_accesses - The sum of the total number of accesses made from more than
#      distance 1 from the cache nodes.
#  best_possible_cost - C * nodes_in_cache + sum(min(total accesses, C) for non-cache nodes)
#  *** Sufficient data structures to be able to unwind changes to all of the above before
#      returning from recursive calls (I won't specify accesses to them, but they need to
#      be there)
#  best_cost - The best cost found.
#  best_solution - The best solution found.

initialize all of those data structures (including best)
create neighbors priority queue of neighbors of root cache node (ordered by accesses)
call extend_current_solution(neighbors)
do what we want with the best solution

function extend_current_solution (available_neighbors):
    if cost < best_cost:
        best_cost = cost
        best_solution = CA # current set of cache nodes.
    if best_cost < best_possible_cost
        return # pruning time
    neighbors = clone(available_neighbors)
    while neighbors:
        node = remove best from neighbors
        if distant_accesses + accesses(node) < C:
            return # this is condition 3 above
        make node in cache set
            - add it to CA
            - update costs
            - add its immediate neighbors to neighbors
        call extend_current_solution
        unwind changes just made
        make node in blocked set
        call extend_current_solution
    unwind changes to blocked set
    return

It will take a lot of work to write this, and you'll need to be careful to maintain all data structures. But my bet is that - despite how heavyweight it looks - you'll find that it prunes your search space enough to run more quickly than your existing solution. (It still won't be snappy.)

Good luck!

Update

When I thought about this more, I realized that a better observation is to note that if you can cut the "not a cache node, not a blocked node" set into two pieces, then you can solve those pieces independently. Each of those sub problems is orders of magnitude faster to solve than the whole problem, so seek to do so as fast as possible.

A good heuristic to do that is to follow the following rules:

While no edge has been reached:
1. Drive towards the closest edge. Distance is measured by how short the shortest path is along the non-cache, non-blocked set.
2. If two edges are equidistant, break ties according to the following preference order: (1, x), (x, 1), (5, x), (x, 5).
3. Break any remaining ties according to preferring to drive towards the center of an edge.
4. Break any remaining ties randomly.
While an edge has been reached and your component still has edges that could become cache pieces:
1. If you can immediately move into an edge and split the edge pieces into two components, do so. Both for "edge in cache set" and "edge not in cache set" you'll get 2 independent subproblems that are more tractable.
2. Else move on a shortest path towards the piece in the middle of your section of edge pieces.
3. If there is a tie, break it in favor of whatever makes the line from the added piece to the added cache element as close to diagonal as possible.
4. Break any remaining ties randomly.
If you fall through here, choose randomly. (You should have a pretty small subproblem at this point. No need to be clever.)

If you try this starting out with (3, 3) as a cache point, you'll find that in the first few decisions you'll find that 7/16 of the time you manage to cut into two even problems, 1/16 of the time you block in the cache point and finish, 1/4 of the time you manage to cut out a 2x2 block into a separate problem (making the overall solution run 16 times faster for that piece) and 1/4 of the time you wind up well on your way towards a solution that is on its way towards either being boxed in (and quickly exhausted), or else being a candidate for a solution with a lot of cache points that gets pruned for being on track to being a bad solution.

I won't give pseudo-code for this variation. It will have a lot of similarities to what I had above, with a number of important details to handle. But I would be willing to bet money that in practice it will run orders of magnitude faster than your original solution.

回复收藏 0 原文