最小粗糙度自动换行的近似算法
我正在寻找一种有效的算法(理想情况下是类似 C 的伪代码)来为以下分区问题提供近似解决方案。给定一个序列 S = {a_i : i=1,...,n} 和一个边界 < em>B,确定将S划分为一些m个连续子序列,如下所示。对于每个k,令s_k为第k子序列的元素之和。 对于每个 k ,分区必须满足:
- s_k ≤ B(假设 的值B 和 a_i 总是可能的)
- m 是最小的(没有更小的分区满足 #1) ;
- 某些分散度测量(例如,s_k 之间的方差或最大成对差异)在大小为 m 的所有分区中最小嗯>。
我知道这与最小粗糙度自动换行算法密切相关。我正在寻找一种可以为n(小于15)的小值提供“足够好”的解决方案,而无需像动态编程那样拿出重型弹药,而且比暴力破解更快一点的东西。
I'm looking for an efficient algorithm (ideally, C-like pseudocode) to give an approximate solution to the following partition problem. Given a sequence S = {a_i : i=1,...,n} and a bound B, determine a partition of S into some number m of contiguous subsequences as follows. For each k, let s_k be the sum of the elements of k-th subsequence. The partition must satisfy:
- s_k ≤ B for each k (assume that the values of B and the a_i are such that this is always possible)
- m is minimal (no smaller partition satisfies #1);
- some measure of dispersion (for example, the variance or the maximum pair-wise difference among the s_k) is minimal among all partitions of size m.
I know that this is closely related to the minimum raggedness word wrap algorithm. I am looking for something that can give a "good enough" solution for small values of n (less than 15) without pulling out heavy ammunition like dynamic programming, but also something a little faster than brute force.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
令 S 表示所有项目的总和,令 n 为项目数。如果将物品放入 m 行,则每行至少有 S/m 重量。因为 S/m ≤ B,所以得到 m ≥ S/B。我将从上限(S/B)开始作为 m 的值,然后将 m 加一,直到找到解决方案。
当设置 m 并给定 n 时,只需递归搜索正确的边界即可。您一一猜测线之间的边界(递归地),并在解决方案变得不可行时回溯。如果您找到解决方案,请将其存储起来以供参考,因为它可能是最好的分散方式。最终您选择最佳解决方案。如果没有解决方案,则将 m 加一并重做。
Let S denote the sum of all the items and let n be the number of items. If you fit the items on m lines, you will have at least S/m weight on every line. Because S/m ≤ B, you get m ≥ S/B. I would start from ceiling(S/B) as the value of m and then increase m by one until a solution is found.
When m is set and n is given, it's just a matter of recursively searching for the correct boundaries. You guess the boundaries between lines one by one (recursively), and backtrack when solution becomes infeasible. If you find a solution, you store it for reference because it might be the best dispersion-wise. Eventually you choose the best solution. If there are no solutions, then you increase m by one and redo.
我最终做了一个简单的回溯搜索。首先,我按照 antti.huima 的描述计算了m(该部分非常简单且固定m)。然后我贪婪地分配项目(尽可能按顺序打包每个分区)。最后,我运行了一个回溯算法,为每个分区边界的起始索引分配一个增量,从最后一个分区开始。每个 delta_j 表示分区 j 的起始位置从贪婪起始位置向后移动多远。不难证明 0 <= delta_j < (分区j-1的大小)对于每个j> 1(否则贪心算法的行为会有所不同)。与填充 1 到 m 分区的搜索相比,这大大减少了搜索空间。
我认为这种搜索的某种启发式变体是可能的,并且会更有效,但我并没有努力寻找一个。
I ended up doing a simple backtracking search. First I calculated m as described by antti.huima (that part was very easy and fixed m). I then allocated items greedily (packing each partition in order as much as possible). Finally I ran a backtracking algorithm to assign a delta to the start index for each partition boundary, starting with the last partition. Each delta_j represents how far back to move the start of partition j from the greedy start position. It's not hard to show that 0 <= delta_j < (size of partition j-1) for each j > 1 (otherwise the greedy algorithm would have behaved differently). This greatly cuts down on the search space as compared to a search to fill the partitions from 1 to m.
I suppose some sort of heuristic variant of this search is possible and would be more efficient, but I didn't try very hard to find one.