计算数组的所有子集,其中最大数字是剩余数字的总和
我一直在努力应对 Greplin 挑战的第 3 级。对于那些不熟悉的人来说,问题如下:
您必须找到数组的所有子集,其中最大数字是其余数字的总和。例如,对于输入:
(1、2、3、4、6)
子集是
1 + 2 = 3
1 + 3 = 4
2 + 4 = 6
1 + 2 + 3 = 6
这是您应该使用的号码列表 运行你的代码。密码是 子集的数量。在上述情况下 答案是 4。
3、4、9、14、15、19、28、37、47、50、54、56、59、61、70、73、78、81、92、95、97、99
I能够编写一个解决方案,构建 22 个数字的所有 400 万多个组合,然后对其进行测试所有这些都会给我正确的答案。问题是需要 40 多分钟才能完成。在网上搜索后,似乎有几个人能够编写一种算法,在不到一秒的时间内得到这个问题的答案。任何人都可以用伪代码解释比计算成本昂贵的暴力方法更好的方法来解决这个问题吗?这让我发疯!
I've been struggling with level 3 of the Greplin challenge. For those not familiar, here is the problem:
you must find all subsets of an array where the largest number is the sum of the remaining numbers. For example, for an input of:
(1, 2, 3, 4, 6)
the subsets would be
1 + 2 = 3
1 + 3 = 4
2 + 4 = 6
1 + 2 + 3 = 6
Here is the list of numbers you should
run your code on. The password is the
number of subsets. In the above case
the answer would be 4.3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99
I was able to code a solution that builds all 4 million plus combinations of the 22 numbers and then tests them all which will get me the right answer. Problem is it takes over 40 minutes to crunch through. Searching around the web it seems like several people were able to write an algorithm to get the answer to this in less than a second. Can anyone explain in pseudo-code a better way to tackle this than the computationally expensive brute-force method? It's driving me nuts!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
诀窍在于您只需要记录做事的方法有多少种。由于数字已排序且为正数,因此这非常容易。这是一个有效的解决方案。 (在我的笔记本电脑上花费的时间不到 0.03 秒。)
The trick is that you only need to keep track of counts of how many ways there are to do things. Since the numbers are sorted and positive, this is pretty easy. Here is an efficient solution. (It takes under 0.03s on my laptop.)
我们知道这些值非零并且从左到右单调增长。
一个想法是枚举可能的总和(任何顺序,从左到右都可以)
然后枚举该值左侧的子集,
因为右边的值不可能参与(它们会使总和
太大了)。我们不必实例化该集合;正如我们所考虑的
每个值,看看如何影响总和。它可能太大(忽略
该值不能在集合中),正好(它是集合中的最后一个成员),
或太小,此时它可能会也可能不会在集合中。
【这个问题让我第一次玩Python。有趣。]
这是 Python 代码;根据 Cprofile.run 这需要 0.00772 秒
在我的 P8700 2.54Ghz 笔记本电脑上。
我得到的结果计数是 179。(与另一个海报的结果匹配。)
编辑:ways_to_sum 可以使用尾递归循环部分实现:
这需要 0.005804 秒来运行:-}相同的答案。
We know the values are nonzero and grow monontonically left to right.
An idea is to enumerate the the possible sums (any order, left to right is fine)
and then enumerate the subsets to the left of of that value,
because values on the right can't possibly participate (they'd make the sum
too big). We don't have have to instantiate the set; just as we consider
each value, see how if affects the sum. It can either be too big (just ignore
that value, can't be in the set), just right (its the last member in the set),
or too small, at which point it might or might not be in the set.
[This problem made me play with Python for the first time. Fun.]
Here's the Python code; according to Cprofile.run this takes .00772 seconds
on my P8700 2.54Ghz laptop.
The resulting count I get is 179. (Matches another poster's result.)
EDIT: ways_to_sum can be partly implemented using a tail-recursion loop:
This takes .005804 seconds to run :-} Same answer.
运行时间不到 5 毫秒 (python)。它使用动态编程的一种变体,称为记忆递归。
go
函数计算前p+1
元素的子集数量,其总和为target
。由于列表已排序,因此只需为每个元素(作为target
)调用一次函数并对结果求和即可:This runs in less than 5ms (python). It uses a variant of dynamical programming called memoized recursion. The
go
function computed the number of subsets of the firstp+1
elements, which sum up totarget
. Because the list is sorted it's enough to call the function once for every element (astarget
) and sum the results:这是
伪代码(带解释):
存储以下变量
i.) 到目前为止子集的“计数”
ii.) 包含所有可能子集之和的数组 b
2.迭代数组(比如a)。对于每个元素 a[i]
i.)遍历数组b并计算a[i]出现的次数。将其添加到“计数”
ii.)遍历数组 b 并为每个元素 b[j].add (a[i]+b[j]) 到 b,因为这是一个可能的子集和。 (如果a[i]+b[j]> a中的最大元素,则可以忽略添加)
iii.) 将 a[i] 添加到 b。
3.你有数:)
This works
pseudo code(with explanation):
store the following variables
i.) 'count' of subsets till now
ii.)an array b which contains sums of all possible subsets
2.iterate through the array (say a). for each element a[i]
i.)go through array b and count the number of occurrences of a[i]. add this to 'count'
ii.)go through array b and for each element b[j].add (a[i]+b[j]) to b because this is a possible subset sum. (if a[i]+b[j]> max element in a, u can ignore to add it)
iii.)add a[i] to b.
3.u have count :)
我使用了 Java 中的组合生成器类:
http://www.merriampark.com/comb.htm< /a>
迭代组合并查找有效子集花费了不到一秒的时间。 (我认为使用外部代码不符合挑战,但我也没有申请。)
I used the combination generator class in Java available here:
http://www.merriampark.com/comb.htm
Iterating through the combos and looking for valid subsets took less than a second. (I don't think using outside code is in keeping with the challenge, but I also didn't apply.)
希望它有帮助,但不要只是复制和粘贴。
Hope it helps, but don't just copy and paste.
我不想死马当活马医,但这里发布的大多数解决方案都错过了优化的关键机会,因此执行时间延长了 6 倍。
与其迭代输入数组并搜索与每个值匹配的总和,不如只计算一次所有可能的相关总和,然后查看哪些总和出现在原始输入数组中,效率要高得多。 (“相关”总和是任何子集总和 <= 数组中的最大值。)
第二种方法的运行速度大约快 6 倍——通常是毫秒而不是厘秒——仅仅是因为它调用了大约 1 的递归求和函数/第六次!
这种方法的代码和完整的解释可以在这个 github 存储库中找到(它是用 PHP 编写的,因为这是给我这个挑战的人所要求的):
https://github.com/misterrobinson/greplinsubsets
I don't want to beat a dead horse, but most of the solutions posted here miss a key opportunity for optimization and therefore take 6 times longer to execute.
Rather than iterating through the input array and searching for sums that match each value, it is far more efficient to calculate all the possible RELEVANT sums only once, then see which of those sums appear in the original input array. (A "relevant" sum is any subset sum <= the max value in the array.)
The second approach runs approximately 6 times faster -- generally milliseconds rather than centiseconds -- simply because it calls the recursive sum-finding function about 1/6th as many times!
The code for this approach and full explanation can be found in this github repo (it's in PHP because that's what was called for by the person who gave me this challenge):
https://github.com/misterrobinson/greplinsubsets