子集和问题的这种变体更容易解决吗?
我有一个与子集和问题相关的问题,我想知道这些差异是否会让它变得更容易,即可以在合理的时间内解决。
给定值 V、集合大小 L 和数字序列 [1,N] S,S 的多少个大小为 L 的子集之和小于 V?
这与子集和问题在三个方面有所不同:
- 我关心有多少子集小于给定值,而不是有多少子集等于。
- 子集大小是固定的。
- 我关心有多少组总和小于V,而不仅仅是是否存在。
有没有合理有效的算法来解决这个问题?
编辑: 显然,这可以使用组合生成算法在 O(N 选择 L) 中完成。 我真正感兴趣的是巧妙的技巧来显着加快速度。
I have a problem related to the subset sum problem and am wondering if the differences make it easier, i.e. solvable in a reasonable amount of time.
Given a value V, a set size L, and a sequence of numbers [1,N] S, how many size L subsets of S sum to less than V?
This is different than the subset sum problem in three ways:
- I care how many subsets are less than a given value, not how many are equal.
- The subset sizes are fixed.
- I care how many sets sum to less than V, not just whether any exist.
Is there any reasonably efficient algorithm to solve this?
Edit:
Obviously, this can be done in O(N choose L) using a combination generating algorithm. What I'm really interested in is clever hacks to significantly speed it up past that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
子集和问题的动态编程解决方案生成一个包含此答案的表(即 V × N 的布尔表,其中 V 是元素的最大数量,N 是满足以下条件的集合中可以包含的最大项目数:约束;如果 <=N 个元素总和 <=V,则每个布尔值为真。 因此,如果 N * V 对您来说不太大,则存在一种可接受的快速算法。 子集和解就是该表中元素数量≤N/2的最大集合元素。
The dynamic programming solution to the subset sum problem generates a table that contains this answer (ie a boolean table of V by N where V is the max number of elements and N is the max number of items that can be in a set that satisifies the constraints; each boolean being true if <=N elements sum to <=V). So if N * V is not too large for you, an acceptably fast algorithm exists. The subset sum solution is just the highest set element in this table for which the number of elements is <= N/2.
如果只是正整数,您可以根据需要执行验证步骤;
取集合中L-1个最小整数的和。 如果这是一个总和 X,那么如果问题应该有解决方案,则 nX 必须低于最大元素。 想想看,你可以这样消除其他L......
If it's only positive integers, you can do a verification step if you need;
Take the sum of the L-1 smallest integers in the set. If that's a sum X, then n-X must be below the largest element if the problem is supposed to have a solution. Come to think of it, you can eliminate other L this way...
好吧,一方面,既然你指定了 size=L 那么即使你想不出任何聪明的办法并且只是使用蛮力,你也会在最坏的情况下得到(N 选择 L)单独的总和,所以它更好一点比 n^^L (好吧,L+1,因为你然后对每个子集求和)。
Well, for one thing since you're specifying size=L then even if you can't think of anything clever and just use brute force you'll have (N choose L) separate sums in the worst case, so it's a bit better than n^^L (well, L+1, as you'd then sum each subset).
这听起来像是一个 n 选择 k 类别的问题。 Skiena 的算法设计手册中介绍了生成 n 的 k 子集,并且该书建议按字典顺序枚举相关子集(例如,递归地)。 然后对每个子集进行求和和比较。
如果您有一个排序集,您可能可以从解决方案空间中删除不可能的解决方案。
This sounds like an n choose k category of problem. Generating k-subsets of n is covered in Skiena's Algorithm Design Manual, and the book suggests enumerating relevant subsets in lexicographic order (recursively, for example). Then do your sum and comparison on each subset.
If you have a sorted set, you could presumably prune impossible solutions from the solution space.
也许动态规划公式适合 FPTAS 的 PTAS。
Perhaps the dynamic programming formulation is amenamble to a PTAS of FPTAS.
你的问题(的决策版本)仍然是 NP 完全的。 这个想法是,如果我们可以解决你的问题,那么(例如,对于每个子集大小)我们可以询问有多少个集合的总和小于 V,有多少个总和小于 V-1,这两个数字的差将是告诉我们子集之和是否恰好等于 V——这样我们就可以解决子集和问题。 [这不是一个完整的证明,因为它是图灵归约,而不是多一次归约。]
但是,有一个简单的动态编程解决方案,可以在时间 O(nLV)。 [这不能证明 P=NP 的原因是 V 在输入大小上可能呈指数形式:使用 n 位,您可以表示最多 2n 的值。 但假设你的 V 不是指数的,这不是问题。]
让 num[v][k][i] 表示 S 的前 i 个元素的大小为 k 的子集的数量,其总和为 v。你可以计算它们为(对于每个 i):
其中 S[i] 是序列中的第 i 个元素。 (任何大小为 k 且总和为 v 的集合要么不使用 S[i],所以它计入 num[v][k][i-1],要么使用 S[i],这意味着其余的子集有 k-1 个元素,仅使用序列中的前 i-1 个数字,求和为 vS[i]。)最后,对每个小于 V 的 v 计数 num[v][L][|S|] ; 这就是你的答案。
另外,如果你小心的话,你可以省略第三个下标(对每个 i 向下运行循环,等等); 我只是为了清楚起见才将其包括在内。
(The decision version of) your problem is still NP-complete. The idea is that if we could solve your problem, then (for each subset size, say) we could ask how many sets sum to less than V and how many sum to less than V-1, and the difference of those two numbers would tell us whether are subsets that sum to exactly V -- thus we could solve the subset sum problem. [This is not a complete proof, because it's a Turing reduction, not a many one reduction.]
However, there is a simple dynamic programming solution that runs in time O(nLV). [The reason this does not prove that P=NP is that V could be exponential in the input size: with n bits, you can represent values upto 2n. But assuming that your V is not exponential, this is not a problem.]
Let num[v][k][i] denote the number of size-k subsets of the first i elements of S that sum to v. You can calculate them as (for each i):
where S[i] is the ith element in your sequence. (Any set of size k that sums to v either doesn't use S[i], so it's counted in num[v][k][i-1], or it uses S[i] which means that the rest of the subset has k-1 elements, uses only the first i-1 numbers in the sequence, and sums to v-S[i].) Finally, count num[v][L][|S|] for each v less than V; that's your answer.
Also, you can omit the third subscript if you do it carefully (run your loop downwards for each i, etc.); I only included it for clarity.
我不准备提出证明,但这听起来可能适合动态编程方案:将大小为 2 的子集列表制成表格,使用它们来计算大小为 3 的子集等,这样您只需要检查一小部分前景。
I'm not prepared to present a proof, but that sounds like it might be amenable to a dynamic programming scheme: tabulate the list of subsets of size 2use them to computer subsets of size 3, etc, so that hyou only need to examine a small collection of prospects.
我想到的一种优化是:对序列进行排序(如果不是这样的话)。 从开头选择前 L-1 项,然后选择最后一项,使其成为最大可能值(序列中的下一个最大值将给出太大的总和)。 丢弃序列的其余部分,因为无论如何这些项目永远不可能成为有效子集的一部分。
之后我想又是一次全面的搜索。 但话又说回来,可能还有其他可能的优化。
One optimization that comes to mind is this: Order your sequence (if it alerady isn't so). Pick the first L-1 items from the start of it, and then pick the last item such, that it's the largest possible value (the next largest value in the sequence would give a sum too big). Discard the rest of the sequence, because those items can never be a part of a valid subset anyway.
After that I guess it's full search again. But then again there might be other optimiziations possible too.