找出一组数字的哪些组合加起来达到给定的总数

发布于 2024-09-28 15:30:25 字数 891 浏览 5 评论 0原文

我的任务是帮助一些会计师解决他们遇到的一个常见问题 - 给出交易清单和总存款,哪些交易是存款的一部分?例如,假设我有这个数字列表:

1.00
2.50
3.75
8.00

我知道我的总存款是 10.50,我可以很容易地看到它由 8.00组成2.50 交易。然而,考虑到一百笔交易和数百万笔存款,它很快就会变得更加困难。

在测试暴力解决方案时(这需要太长的时间才能实现),我有两个问题:

  1. 对于大约 60 个数字的列表,似乎可以找到十几个或更多的组合来获得合理的总数。 我期待一个组合可以满足我的总体要求,或者可能有几种可能性,但似乎总是有很多组合。是否有数学原理可以描述这是为什么?似乎给定一组即使是中等大小的随机数,您也可以找到一个多重组合,其总和几乎可以达到您想要的任何总数。

  2. 我为这个问题建立了一个强力解决方案,但它显然是 O(n!),并且很快就会失去控制。除了明显的捷径(排除大于总数本身的数字)之外,有没有办法缩短计算时间?

我当前(超慢)解决方案的详细信息:

详细金额列表按从大到小排序,然后递归运行以下过程:

  • 获取列表中的下一项,看看是否将其添加到您的跑步总分使您的总分符合目标。如果是,则将当前链保留为匹配项。如果未达到目标,请将其添加到运行总计中,将其从详细金额列表中删除,然后再次调用此过程。

这样,它会快速排除较大的数字,将列表缩减为仅需要的数字考虑。然而,还是n!更大的列表似乎永远不会完成,所以我对任何可以加快速度的捷径感兴趣 - 我怀疑即使从列表中删除 1 个数字也会将计算时间减少一半。

感谢您的帮助!

I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:

1.00
2.50
3.75
8.00

And I know that my total deposit is 10.50, I can easily see that it's made up of the 8.00 and 2.50 transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.

In testing a brute force solution (which takes way too long to be practical), I had two questions:

  1. With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.

  2. I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?

Details on my current (super-slow) solution:

The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:

  • Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again

This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.

Thanks for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

涫野音 2024-10-05 15:30:25

背包问题的这种特殊情况称为子集和

This special case of the Knapsack problem is called Subset Sum.

坠似风落 2024-10-05 15:30:25

C#版本

设置测试:

using System;
using System.Collections.Generic;

public class Program
{
    public static void Main(string[] args)
    {
    // subtotal list
    List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });

    // get matches
    List<double[]> results = Knapsack.MatchTotal(100.50, totals);

    // print results
    foreach (var result in results)
    {
        Console.WriteLine(string.Join(",", result));
    }

    Console.WriteLine("Done.");
    Console.ReadKey();
    }
}

代码:

using System.Collections.Generic;
using System.Linq;

public class Knapsack
{
    internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
    {
    List<double[]> results = new List<double[]>();

    while (subTotals.Contains(theTotal))
    {
        results.Add(new double[1] { theTotal });
        subTotals.Remove(theTotal);
    }

    // if no subtotals were passed
    // or all matched the Total
    // return
    if (subTotals.Count == 0)
        return results;

    subTotals.Sort();

    double mostNegativeNumber = subTotals[0];
    if (mostNegativeNumber > 0)
        mostNegativeNumber = 0;

    // if there aren't any negative values
    // we can remove any values bigger than the total
    if (mostNegativeNumber == 0)
        subTotals.RemoveAll(d => d > theTotal);

    // if there aren't any negative values
    // and sum is less than the total no need to look further
    if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
        return results;

    // get the combinations for the remaining subTotals
    // skip 1 since we already removed subTotals that match
    for (int choose = 2; choose <= subTotals.Count; choose++)
    {
        // get combinations for each length
        IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);

        // add combinations where the sum mathces the total to the result list
        results.AddRange(from combo in combos
                 where combo.Sum() == theTotal
                 select combo.ToArray());
    }

    return results;
    }
}

public static class Combination
{
    public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
    {
    return choose == 0 ?                        // if choose = 0
        new[] { new T[0] } :                    // return empty Type array
        elements.SelectMany((element, i) =>     // else recursively iterate over array to create combinations
        elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
    }
}

结果:

100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.

如果重复subTotals,就会出现重复的结果(期望的效果)。实际上,您可能希望使用带有某个 ID 的 subTotal Tupled,以便您可以将其与您的数据关联起来。

C# version

setup test:

using System;
using System.Collections.Generic;

public class Program
{
    public static void Main(string[] args)
    {
    // subtotal list
    List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });

    // get matches
    List<double[]> results = Knapsack.MatchTotal(100.50, totals);

    // print results
    foreach (var result in results)
    {
        Console.WriteLine(string.Join(",", result));
    }

    Console.WriteLine("Done.");
    Console.ReadKey();
    }
}

code:

using System.Collections.Generic;
using System.Linq;

public class Knapsack
{
    internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
    {
    List<double[]> results = new List<double[]>();

    while (subTotals.Contains(theTotal))
    {
        results.Add(new double[1] { theTotal });
        subTotals.Remove(theTotal);
    }

    // if no subtotals were passed
    // or all matched the Total
    // return
    if (subTotals.Count == 0)
        return results;

    subTotals.Sort();

    double mostNegativeNumber = subTotals[0];
    if (mostNegativeNumber > 0)
        mostNegativeNumber = 0;

    // if there aren't any negative values
    // we can remove any values bigger than the total
    if (mostNegativeNumber == 0)
        subTotals.RemoveAll(d => d > theTotal);

    // if there aren't any negative values
    // and sum is less than the total no need to look further
    if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
        return results;

    // get the combinations for the remaining subTotals
    // skip 1 since we already removed subTotals that match
    for (int choose = 2; choose <= subTotals.Count; choose++)
    {
        // get combinations for each length
        IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);

        // add combinations where the sum mathces the total to the result list
        results.AddRange(from combo in combos
                 where combo.Sum() == theTotal
                 select combo.ToArray());
    }

    return results;
    }
}

public static class Combination
{
    public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
    {
    return choose == 0 ?                        // if choose = 0
        new[] { new T[0] } :                    // return empty Type array
        elements.SelectMany((element, i) =>     // else recursively iterate over array to create combinations
        elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
    }
}

results:

100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.

If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.

别挽留 2024-10-05 15:30:25

如果我正确理解你的问题,你有一组交易,你只想知道其中哪些可以包含在给定的总数中。因此,如果有 4 个可能的交易,则有 2^4 = 16 个可能的集合需要检查。这个问题是,对于 100 个可能的交易,搜索空间有 2^100 = 1267650600228229401496703205376 个可能的组合可供搜索。 For 1000 potential transactions in the mix, it grows to a total of

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

sets that you must test.蛮力很难成为解决这些问题的可行方案。

相反,请使用可以处理背包问题的求解器。但即便如此,我也不确定您是否可以生成所有可能解决方案的完整枚举,而无需使用某种强力方法。

If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

sets that you must test. Brute force will hardly be a viable solution on these problems.

Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.

诠释孤独 2024-10-05 15:30:25

有一个便宜的 Excel 插件可以解决这个问题: SumMatch

SumMatch 运行中

There is a cheap Excel Add-in that solves this problem: SumMatch

SumMatch in action

掀纱窥君容 2024-10-05 15:30:25

superuser.com 上发布的 Excel Solver Addin 有一个很好的解决方案(如果您有 Excel)https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total

The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total

扮仙女 2024-10-05 15:30:25

它有点像 0-1 背包问题,是 NP 完全的,可以通过动态规划在多项式时间内解决。

http://en.wikipedia.org/wiki/Knapsack_problem

但是在算法结束时你还需要检查总和是否是您想要的。

Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.

http://en.wikipedia.org/wiki/Knapsack_problem

But at the end of the algorithm you also need to check that the sum is what you wanted.

旧时浪漫 2024-10-05 15:30:25

根据您的数据,您可以首先查看每笔交易的美分部分。就像在您最初的示例中一样,您知道 2.50 必须是总数的一部分,因为它是唯一一组加起来为 50 的非零美分交易。

Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.

玩套路吗 2024-10-05 15:30:25

这不是一个超级高效的解决方案,但这里有一个 CoffeeScript 的实现

combinations 返回 list 中元素的所有可能组合

combinations = (list) ->
        permuations = Math.pow(2, list.length) - 1
        out = []
        combinations = []

        while permuations
            out = []

            for i in [0..list.length]
                y = ( 1 << i )
                if( y & permuations and (y isnt permuations))
                    out.push(list[i])

            if out.length <= list.length and out.length > 0
                combinations.push(out)

            permuations--

        return combinations

,然后 find_components 利用它来确定哪些数字总计为 total

find_components = (total, list) ->
    # given a list that is assumed to have only unique elements

        list_combinations = combinations(list)

        for combination in list_combinations
            sum = 0
            for number in combination
                sum += number

            if sum is total
                return combination
        return []

示例

list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1

console.log(find_components(total, list)) 

这是一个返回 [ 7.2, 2, 4.1 ]

Not a super efficient solution but heres an implementation in coffeescript

combinations returns all possible combinations of the elements in list

combinations = (list) ->
        permuations = Math.pow(2, list.length) - 1
        out = []
        combinations = []

        while permuations
            out = []

            for i in [0..list.length]
                y = ( 1 << i )
                if( y & permuations and (y isnt permuations))
                    out.push(list[i])

            if out.length <= list.length and out.length > 0
                combinations.push(out)

            permuations--

        return combinations

and then find_components makes use of it to determine which numbers add up to total

find_components = (total, list) ->
    # given a list that is assumed to have only unique elements

        list_combinations = combinations(list)

        for combination in list_combinations
            sum = 0
            for number in combination
                sum += number

            if sum is total
                return combination
        return []

Heres an example

list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1

console.log(find_components(total, list)) 

which returns [ 7.2, 2, 4.1 ]

你如我软肋 2024-10-05 15:30:25
#include <stdio.h>
#include <stdlib.h>

/* Takes at least 3 numbers as arguments.
 * First number is desired sum.
 * Find the subset of the rest that comes closest
 * to the desired sum without going over.
 */
static long *elements;
static int nelements;

/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
    long sum;                    /* sum of all the elements in the list */
    struct status *next;         /* points to next element in the list */
    int index;                   /* index into elements array of this element */
};

/*  find the subset of elements[startingat .. nelements-1]  whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
    struct status *sumcdr = NULL;
    struct status *sumlist = NULL;

    /* sum of zero elements or summing to zero */
    if (startingat == nelements || desiredsum == 0) {
        return NULL;
    }

    /* optimal sum using the current element */
    /* if current elements[startingat] too big, it won't fit, don't try it */
    if (elements[startingat] <= desiredsum) {
        sumlist = malloc(sizeof(struct status));
        sumlist->index = startingat;
        sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
        sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
        if (sumlist->sum == desiredsum)
            return sumlist;
    }

    /* optimal sum not using current element */
    sumcdr = reportoptimalsubset(desiredsum, startingat + 1);

    if (!sumcdr) return sumlist;
    if (!sumlist) return sumcdr;

    return (sumcdr->sum < sumlist->sum) ?  sumlist : sumcdr;
}

int main(int argc, char **argv) {
  struct status *result = NULL;
  long desiredsum = strtol(argv[1], NULL, 10);

  nelements = argc - 2;
  elements = malloc(sizeof(long) * nelements);

  for (int i = 0; i < nelements; i++) {
      elements[i] = strtol(argv[i + 2], NULL , 10);
  }

  result = reportoptimalsubset(desiredsum, 0);
  if (result)
      printf("optimal subset = %ld\n", result->sum);

  while (result) {
      printf("%ld + ", elements[result->index]);
      result = result->next;
  }

  printf("\n");

}
#include <stdio.h>
#include <stdlib.h>

/* Takes at least 3 numbers as arguments.
 * First number is desired sum.
 * Find the subset of the rest that comes closest
 * to the desired sum without going over.
 */
static long *elements;
static int nelements;

/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
    long sum;                    /* sum of all the elements in the list */
    struct status *next;         /* points to next element in the list */
    int index;                   /* index into elements array of this element */
};

/*  find the subset of elements[startingat .. nelements-1]  whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
    struct status *sumcdr = NULL;
    struct status *sumlist = NULL;

    /* sum of zero elements or summing to zero */
    if (startingat == nelements || desiredsum == 0) {
        return NULL;
    }

    /* optimal sum using the current element */
    /* if current elements[startingat] too big, it won't fit, don't try it */
    if (elements[startingat] <= desiredsum) {
        sumlist = malloc(sizeof(struct status));
        sumlist->index = startingat;
        sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
        sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
        if (sumlist->sum == desiredsum)
            return sumlist;
    }

    /* optimal sum not using current element */
    sumcdr = reportoptimalsubset(desiredsum, startingat + 1);

    if (!sumcdr) return sumlist;
    if (!sumlist) return sumcdr;

    return (sumcdr->sum < sumlist->sum) ?  sumlist : sumcdr;
}

int main(int argc, char **argv) {
  struct status *result = NULL;
  long desiredsum = strtol(argv[1], NULL, 10);

  nelements = argc - 2;
  elements = malloc(sizeof(long) * nelements);

  for (int i = 0; i < nelements; i++) {
      elements[i] = strtol(argv[i + 2], NULL , 10);
  }

  result = reportoptimalsubset(desiredsum, 0);
  if (result)
      printf("optimal subset = %ld\n", result->sum);

  while (result) {
      printf("%ld + ", elements[result->index]);
      result = result->next;
  }

  printf("\n");

}
做个ˇ局外人 2024-10-05 15:30:25

顺便说一句,在进行算术和相等比较时,最好避免使用浮点数和双精度数。

Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文