找出一组数字的哪些组合加起来达到给定的总数
我的任务是帮助一些会计师解决他们遇到的一个常见问题 - 给出交易清单和总存款,哪些交易是存款的一部分?例如,假设我有这个数字列表:
1.00
2.50
3.75
8.00
我知道我的总存款是 10.50
,我可以很容易地看到它由 8.00
和 组成2.50 交易。然而,考虑到一百笔交易和数百万笔存款,它很快就会变得更加困难。
在测试暴力解决方案时(这需要太长的时间才能实现),我有两个问题:
对于大约 60 个数字的列表,似乎可以找到十几个或更多的组合来获得合理的总数。 我期待一个组合可以满足我的总体要求,或者可能有几种可能性,但似乎总是有很多组合。是否有数学原理可以描述这是为什么?似乎给定一组即使是中等大小的随机数,您也可以找到一个多重组合,其总和几乎可以达到您想要的任何总数。
我为这个问题建立了一个强力解决方案,但它显然是 O(n!),并且很快就会失去控制。除了明显的捷径(排除大于总数本身的数字)之外,有没有办法缩短计算时间?
我当前(超慢)解决方案的详细信息:
详细金额列表按从大到小排序,然后递归运行以下过程:
- 获取列表中的下一项,看看是否将其添加到您的跑步总分使您的总分符合目标。如果是,则将当前链保留为匹配项。如果未达到目标,请将其添加到运行总计中,将其从详细金额列表中删除,然后再次调用此过程。
这样,它会快速排除较大的数字,将列表缩减为仅需要的数字考虑。然而,还是n!更大的列表似乎永远不会完成,所以我对任何可以加快速度的捷径感兴趣 - 我怀疑即使从列表中删除 1 个数字也会将计算时间减少一半。
感谢您的帮助!
I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:
1.00
2.50
3.75
8.00
And I know that my total deposit is 10.50
, I can easily see that it's made up of the 8.00
and 2.50
transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.
In testing a brute force solution (which takes way too long to be practical), I had two questions:
With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.
I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?
Details on my current (super-slow) solution:
The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:
- Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again
This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
背包问题的这种特殊情况称为子集和。
This special case of the Knapsack problem is called Subset Sum.
C#版本
设置测试:
代码:
结果:
如果重复subTotals,就会出现重复的结果(期望的效果)。实际上,您可能希望使用带有某个 ID 的 subTotal Tupled,以便您可以将其与您的数据关联起来。
C# version
setup test:
code:
results:
If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.
如果我正确理解你的问题,你有一组交易,你只想知道其中哪些可以包含在给定的总数中。因此,如果有 4 个可能的交易,则有 2^4 = 16 个可能的集合需要检查。这个问题是,对于 100 个可能的交易,搜索空间有 2^100 = 1267650600228229401496703205376 个可能的组合可供搜索。 For 1000 potential transactions in the mix, it grows to a total of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
sets that you must test.蛮力很难成为解决这些问题的可行方案。
相反,请使用可以处理背包问题的求解器。但即便如此,我也不确定您是否可以生成所有可能解决方案的完整枚举,而无需使用某种强力方法。
If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
sets that you must test. Brute force will hardly be a viable solution on these problems.
Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.
有一个便宜的 Excel 插件可以解决这个问题: SumMatch
There is a cheap Excel Add-in that solves this problem: SumMatch
superuser.com 上发布的 Excel Solver Addin 有一个很好的解决方案(如果您有 Excel)https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total
The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total
它有点像 0-1 背包问题,是 NP 完全的,可以通过动态规划在多项式时间内解决。
http://en.wikipedia.org/wiki/Knapsack_problem
但是在算法结束时你还需要检查总和是否是您想要的。
Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.
http://en.wikipedia.org/wiki/Knapsack_problem
But at the end of the algorithm you also need to check that the sum is what you wanted.
根据您的数据,您可以首先查看每笔交易的美分部分。就像在您最初的示例中一样,您知道 2.50 必须是总数的一部分,因为它是唯一一组加起来为 50 的非零美分交易。
Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.
这不是一个超级高效的解决方案,但这里有一个 CoffeeScript 的实现
combinations
返回list
中元素的所有可能组合,然后
find_components
利用它来确定哪些数字总计为total
示例
这是一个返回
[ 7.2, 2, 4.1 ]
的Not a super efficient solution but heres an implementation in coffeescript
combinations
returns all possible combinations of the elements inlist
and then
find_components
makes use of it to determine which numbers add up tototal
Heres an example
which returns
[ 7.2, 2, 4.1 ]
顺便说一句,在进行算术和相等比较时,最好避免使用浮点数和双精度数。
Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.