我不太擅长线性编程,所以我在这里发布这个问题。
希望有人能指出我正确的方向。
这不是作业问题,所以不要误会。
我有一个 5x5 矩阵(25 个节点)。每个节点与其相邻节点(或邻居节点)之间的距离为1个单位。节点可以处于以下两种情况之一:缓存或访问。如果节点'i'是缓存节点,则访问节点'j'可以以Dij x Aij的成本访问它> (访问成本)。 Dij 是节点 i 和 j 之间的曼哈顿距离。 Aij 是节点 i 到 j 的访问频率。
为了成为缓存节点 i,它需要从现有缓存节点 k 进行缓存,成本为 Dik x C,其中 C 是整数常量。 (缓存成本)。 C称为缓存频率。
A 以 25x25 矩阵的形式提供,其中包含显示任意一对节点 i 和 j 之间的访问频率的所有整数。 D 以 25x25 矩阵的形式提供,其中包含任意一对节点 i 和 j 之间的所有曼哈顿距离。
假设矩阵中有1个缓存节点,找出其他缓存节点和访问节点的集合,使得总成本最小。
总成本 = 总缓存成本 + 总访问成本。
I'm not so good at linear programing so I'm posting this problem here.
Hope somebody can point me out to the right direction.
It is not homework problem so don't misunderstand.
I have a matrix 5x5 (25 nodes). Distance between each node and its adjacent nodes (or neighbor nodes) is 1 unit. A node can be in 1 of 2 conditions: cache or access. If a node 'i' is a cache node, an access nodes 'j' can be able to access it with a cost of Dij x Aij (Access Cost). Dij is Manhattan distance between node i and j. Aij is access frequency from node i to j.
In order to become a cache node i, it needs to cache from an existing cache node k with a cost of Dik x C where C is a Integer constant. (Cache Cost) . C is called caching frequency.
A is provided as an 25x25 matrix containing all integers that shows access frequency between any pair of node i and j. D is provided as an 25x25 matrix containing all Manhattan distances between any pair of node i and j.
Assume there is 1 cache node in the matrix, find out the set of other cache nodes and access nodes such that the total cost will be minimized.
Total Cost = Total Cache Cost + Total Access Cost .
发布评论
评论(3)
我已经解决了一些类似这样的问题。
首先,如果您不需要确切的答案,我通常建议研究遗传算法或贪婪算法之类的东西。它不会是正确的,但通常也不会是坏的。而且它会比精确算法快得多。例如,您可以从所有点作为缓存点开始,然后找到最能降低成本的点,使其成为非缓存点。继续下去,直到删除下一个会使成本上升,然后将其用作您的解决方案。这不会是最好的。一般来说,它会相当不错。
如果您确实需要确切的答案,则需要对大量数据进行强力搜索。假设指定了初始缓存点,您将有 224 = 16,777,216 组可能的缓存点可供搜索。那很贵。
更便宜地做到这一点的技巧(注意,不是便宜,只是更便宜)是找到修剪搜索的方法。请牢记这样一个事实:如果对您查看的每个集合执行 100 倍的工作量,您可以平均删除 10 个点作为缓存点,那么您的整体算法将访问 0.1% 的集合,并且您的代码将运行快 10 倍。因此,即使修剪步骤相当昂贵,还是值得尽早且经常地投入大量精力进行修剪。
通常您需要多种修剪策略。其中之一通常是“我们在这里能做的最好的事情比我们以前发现的最好的事情还要糟糕。”如果您已经找到了一个非常好的最佳解决方案,那么效果会更好。因此,在寻找解决方案时,通常值得花一些精力进行一些本地优化。
通常,这些优化不会改变您正在做大量工作的事实。但它们确实可以让您减少几个数量级的工作。
我对此的初步尝试将利用以下观察结果。
x
路由到x 到
y
缓存的路径>y 沿着那条路。因此,不失一般性,缓存点组在网格上连接。这可能不是最好的观察结果集。但这可能是相当合理的。要利用这一点,您至少需要一种您可能不熟悉的数据结构。如果您不知道优先级队列是什么,那么请在您的应用程序中寻找一个高效的队列。选择的语言。如果您找不到,堆非常容易实现并且可以工作作为优先级队列非常好。
考虑到这一点,假设您已获得所描述的信息和初始缓存节点
P
,下面是找到最佳节点的算法的伪代码。编写此代码需要花费大量工作,并且您需要小心维护所有数据结构。但我敢打赌,尽管它看起来有多么重量级,您会发现它会充分修剪您的搜索空间,从而比您现有的解决方案运行得更快。 (它仍然不会很快。)
祝你好运!
更新
当我更多地思考这个问题时,我意识到更好的观察是注意,如果你可以将“不是缓存节点,不是阻塞节点”集合切成两块,那么你可以独立地解决这些块。每个子问题的解决速度都比整个问题快几个数量级,因此要尽可能快地解决。
一个很好的启发式方法是遵循以下规则:
(1, x), (x, 1), (5, x), (x, 5)
。< /里>如果您尝试从
(3, 3)
作为缓存点开始,您会发现在在最初的几个决定中,您会发现 7/16 的时间您设法切入两个偶数问题,1/16 的时间您在缓存点中阻塞并完成,1/4 的时间您设法切出一个2x2 块变成一个单独的问题(使该部分的整体解决方案运行速度快 16 倍),并且有 1/4 的时间您最终会顺利找到解决方案,该解决方案要么被限制(并很快耗尽),要么成为一个具有大量缓存点的解决方案的候选者,这些缓存点因可能成为糟糕的解决方案而被修剪。我不会给出这个变体的伪代码。它与我上面的内容有很多相似之处,并且有许多重要的细节需要处理。但我愿意打赌,在实践中它的运行速度会比原来的解决方案快几个数量级。
I've tackled a few problems that are something like this.
First, if you don't need an exact answer, I'd generally suggest looking into something like a genetic algorithm, or doing a greedy algorithm. It won't be right, but it won't generally be bad either. And it will be much faster than an exact algorithm. For instance you can start with all points as cache points, then find the point which reduces your cost most from making it a non-caching point. Continue until removing the next one makes the cost goes up, and use that as your solution. This won't be best. It will generally be reasonably good.
If you do need an exact answer, you will need to brute force search of a lot of data. Assuming that the initial cache point is specified, you'll have 224 = 16,777,216 possible sets of cache points to search. That is expensive.
The trick to doing it more cheaply (note, not cheaply, just more cheaply) is finding ways to prune your search. Take to heart the fact that if doing 100 times as much work on each set you look at lets you remove an average of 10 points from consideration as cache points, then your overall algorithm will visit 0.1% as many sets, and your code will run 10 times faster. Therefore it is worth putting a surprising amount of energy into pruning early and often, even if the pruning step is fairly expensive.
Often you want multiple pruning strategies. One of them is usually "the best we can do from here is worst than the best we have found previously." This works better if you've already found a pretty good best solution. Therefore it is often worth a bit of effort to do some local optimization in your search for solutions.
Typically these optimizations won't change the fact that you are doing a tremendous amount of work. But they do let you do orders of magnitude less work.
My initial try at this would take advantage of the following observations.
x
is a cache point, andy
is its nearest caching neighbor. Then you can always make some path fromx
toy
cache "for free" if you just route the cache update traffic fromx
toy
along that path. Therefore without loss of generality the set of cache points is connected on the grid.This might not be the best set of observations to use. But it is likely to be pretty reasonable. To take advantage of this you'll need at least one data structure you might not be familiar with. If you don't know what a priority queue is, then look around for an efficient one in your language of choice. If you can't find one, a heap is pretty easy to implement and works pretty well as a priority queue.
With that in mind, assuming that you have been given the information you've described and an initial cache node
P
, here is pseudo-code for an algorithm to find the best.It will take a lot of work to write this, and you'll need to be careful to maintain all data structures. But my bet is that - despite how heavyweight it looks - you'll find that it prunes your search space enough to run more quickly than your existing solution. (It still won't be snappy.)
Good luck!
Update
When I thought about this more, I realized that a better observation is to note that if you can cut the "not a cache node, not a blocked node" set into two pieces, then you can solve those pieces independently. Each of those sub problems is orders of magnitude faster to solve than the whole problem, so seek to do so as fast as possible.
A good heuristic to do that is to follow the following rules:
(1, x), (x, 1), (5, x), (x, 5)
.If you try this starting out with
(3, 3)
as a cache point, you'll find that in the first few decisions you'll find that 7/16 of the time you manage to cut into two even problems, 1/16 of the time you block in the cache point and finish, 1/4 of the time you manage to cut out a 2x2 block into a separate problem (making the overall solution run 16 times faster for that piece) and 1/4 of the time you wind up well on your way towards a solution that is on its way towards either being boxed in (and quickly exhausted), or else being a candidate for a solution with a lot of cache points that gets pruned for being on track to being a bad solution.I won't give pseudo-code for this variation. It will have a lot of similarities to what I had above, with a number of important details to handle. But I would be willing to bet money that in practice it will run orders of magnitude faster than your original solution.
解是一个集合,所以这不是一个线性规划问题。它是连接设施位置的特例。 Bardossy 和 Raghavan 有一个看起来很有希望的启发式方法: http://terpconnect.umd.edu/ ~raghavan/preprints/confl.pdf
The solution is a set, so this is not a linear programming problem. What it is is a special case of connected facility location. Bardossy and Raghavan have a heuristic that looks promising: http://terpconnect.umd.edu/~raghavan/preprints/confl.pdf
螺旋缓存是解决方案的类比吗? http://strumpen.net/ibm-rc24767.pdf
is spiral cache an analogy to the solution? http://strumpen.net/ibm-rc24767.pdf