生成按属性排序的组合

发布于 2024-07-13 11:51:19 字数 706 浏览 6 评论 0原文

我正在寻找一种方法来生成按单个属性排序的对象组合。 我不认为字典顺序是我正在寻找的......我会尝试举一个例子。 假设我有一个对象 A、B、C、D 列表,其中我想要按 3、3、2、1 排序的属性值。 这给出了 A3、B3、C2、D1 对象。 现在我想生成2个对象的组合,但它们需要按降序排列:

  • A3 B3
  • A3 C2
  • B3 C2
  • A3 D1
  • B3 D1
  • C2 D1

生成所有组合并对它们进行排序是不可接受的,因为现实世界场景涉及大集合以及数以百万计的组合。 (一组 40 个,顺序为 8 个),我只需要高于特定阈值的组合。

实际上,我需要对高于阈值的组合进行计数,并按给定属性的总和进行分组,但我认为这要困难得多 - 所以我会满足于开发高于阈值的所有组合并对它们进行计数。 如果可能的话。

编辑 - 我原来的问题不是很精确......我实际上不需要订购这些组合,只是认为这将有助于隔离高于阈值的组合。 更准确地说,在上面的示例中,给定阈值 5,我正在寻找给定集合产生 1 个总和为 6 ( A3 B3 ) 的组合和 2 个总和为 5 ( A3 C2 ) 的信息, B3 C2)。 我实际上不需要组合本身。

我正在研究子集和问题,但如果我正确理解给定的动态解决方案,它只会向您提供是否有给定总和的信息,而不是总和的计数。

谢谢

I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:

  • A3 B3
  • A3 C2
  • B3 C2
  • A3 D1
  • B3 D1
  • C2 D1

Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.

Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.

EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.

I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

予囚 2024-07-20 11:51:19

实际上,我认为您确实想要字典顺序,但是降序而不是升序。 另外:

  • 根据您的描述,我不清楚 A、B、...D 在您的答案中发挥任何作用(除了可能作为值的容器)。
  • 我认为你的问题示例只是“对于每个至少为 5 的整数,直到两个值的最大可能总数,集合 {3, 3, 2, 1} 中有多少个不同的对具有该整数的总和?”
  • 有趣的部分是一旦无法达成可能的解决方案(剩余可实现的金额太小)时的早期救助。

我稍后会发布示例代码。

这是我承诺的示例代码,下面有一些注释:

public class Combos {

    /* permanent state for instance */
    private int values[];
    private int length;

    /* transient state during single "count" computation */
    private int n;
    private int limit;
    private Tally<Integer> tally;
    private int best[][];  // used for early-bail-out

    private void initializeForCount(int n, int limit) {
        this.n = n;
        this.limit = limit;
        best = new int[n+1][length+1];
        for (int i = 1; i <= n; ++i) {
            for (int j = 0; j <= length - i; ++j) {
                best[i][j] = values[j] + best[i-1][j+1];
            }
        }
    }

    private void countAt(int left, int start, int sum) {
        if (left == 0) {
            tally.inc(sum);
        } else {
            for (
                int i = start;
                i <= length - left
                && limit <= sum + best[left][i];  // bail-out-check
                ++i
            ) {
                countAt(left - 1, i + 1, sum + values[i]);
            }
        }
    }

    public Tally<Integer> count(int n, int limit) {
        tally = new Tally<Integer>();
        if (n <= length) {
            initializeForCount(n, limit);
            countAt(n, 0, 0);
        }
        return tally;
    }

    public Combos(int[] values) {
        this.values = values;
        this.length = values.length;
    }

}

前言注释:

这使用了一个名为 Tally 的小辅助类,它只是隔离制表(包括从不初始化) -之前见过的钥匙)。 我会把它放在最后。

为了保持简洁,我采取了一些对于“真实”代码来说不是很好的做法的快捷方式:

  • 这不会检查空值数组等。
  • 我假设值数组已经按降序排序,这是必需的用于早期救助技术。 (好的生产代码将包括排序。)
  • 我将瞬态数据放入实例变量中,而不是将它们作为支持 count 的私有方法之间的参数传递。 这使得该类成为非线程安全的。

说明:

使用要组合的(降序排列)整数数组创建 Combos 实例。 value 数组每个实例设置一次,但可以根据不同的群体大小和限制对 count 进行多次调用。

count 方法触发对 valuesn 个整数的唯一组合的(大部分)标准递归遍历。 limit 参数给出了利息总和的下限。

countAt 方法检查 values 中的整数组合。 left 参数是剩余多少个整数来组成总和中的 n 个整数,startvalues 中的位置> 从中搜索,sum 是部分和。

早期纾困机制基于计算best,这是一个二维数组,指定从给定状态可达到的“最佳”总和。 best[n][p] 中的值是从原始值<的位置 p 开始的 n 个值的最大和。 /代码>。

当正确的总体累积完毕后,countAt 的递归就会触底; 这会将当前的总和n 值)添加到tally 中。 如果 countAt 尚未触底,它会从 start-ing 位置扫描 values 以增加当前的部分 sum >,只要:

  • values 中保留足够的位置来实现指定的总体,并且
  • 剩余的最佳(最大)小计足够大以达到限制< /代码>。

使用您的问题数据运行示例:

    int[] values = {3, 3, 2, 1};
    Combos mine = new Combos(values);
    Tally<Integer> tally = mine.count(2, 5);
    for (int i = 5; i < 9; ++i) {
        int n = tally.get(i);
        if (0 < n) {
            System.out.println("found " + tally.get(i) + " sums of " + i);
        }
    }

产生您指定的结果:

found 2 sums of 5
found 1 sums of 6

这是 Tally 代码:

public static class Tally<T> {
    private Map<T,Integer> tally = new HashMap<T,Integer>();
    public Tally() {/* nothing */}
    public void inc(T key) {
        Integer value = tally.get(key);
        if (value == null) {
            value = Integer.valueOf(0);
        }
        tally.put(key, (value + 1));
    }
    public int get(T key) {
        Integer result = tally.get(key);
        return result == null ? 0 : result;
    }
    public Collection<T> keys() {
        return tally.keySet();
    }
}

Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:

  • It's not clear to me from your description that A, B, ... D play any role in your answer (except possibly as the container for the values).
  • I think your question example is simply "For each integer at least 5, up to the maximum possible total of two values, how many distinct pairs from the set {3, 3, 2, 1} have sums of that integer?"
  • The interesting part is the early bailout, once no possible solution can be reached (remaining achievable sums are too small).

I'll post sample code later.

Here's the sample code I promised, with a few remarks following:

public class Combos {

    /* permanent state for instance */
    private int values[];
    private int length;

    /* transient state during single "count" computation */
    private int n;
    private int limit;
    private Tally<Integer> tally;
    private int best[][];  // used for early-bail-out

    private void initializeForCount(int n, int limit) {
        this.n = n;
        this.limit = limit;
        best = new int[n+1][length+1];
        for (int i = 1; i <= n; ++i) {
            for (int j = 0; j <= length - i; ++j) {
                best[i][j] = values[j] + best[i-1][j+1];
            }
        }
    }

    private void countAt(int left, int start, int sum) {
        if (left == 0) {
            tally.inc(sum);
        } else {
            for (
                int i = start;
                i <= length - left
                && limit <= sum + best[left][i];  // bail-out-check
                ++i
            ) {
                countAt(left - 1, i + 1, sum + values[i]);
            }
        }
    }

    public Tally<Integer> count(int n, int limit) {
        tally = new Tally<Integer>();
        if (n <= length) {
            initializeForCount(n, limit);
            countAt(n, 0, 0);
        }
        return tally;
    }

    public Combos(int[] values) {
        this.values = values;
        this.length = values.length;
    }

}

Preface remarks:

This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.

To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:

  • This doesn't check for a null value array, etc.
  • I assume that the value array is already sorted into descending order, required for the early-bail-out technique. (Good production code would include the sorting.)
  • I put transient data into instance variables instead of passing them as arguments among the private methods that support count. That makes this class non-thread-safe.

Explanation:

An instance of Combos is created with the (descending ordered) array of integers to combine. The value array is set up once per instance, but multiple calls to count can be made with varying population sizes and limits.

The count method triggers a (mostly) standard recursive traversal of unique combinations of n integers from values. The limit argument gives the lower bound on sums of interest.

The countAt method examines combinations of integers from values. The left argument is how many integers remain to make up n integers in a sum, start is the position in values from which to search, and sum is the partial sum.

The early-bail-out mechanism is based on computing best, a two-dimensional array that specifies the "best" sum reachable from a given state. The value in best[n][p] is the largest sum of n values beginning in position p of the original values.

The recursion of countAt bottoms out when the correct population has been accumulated; this adds the current sum (of n values) to the tally. If countAt has not bottomed out, it sweeps the values from the start-ing position to increase the current partial sum, as long as:

  • enough positions remain in values to achieve the specified population, and
  • the best (largest) subtotal remaining is big enough to make the limit.

A sample run with your question's data:

    int[] values = {3, 3, 2, 1};
    Combos mine = new Combos(values);
    Tally<Integer> tally = mine.count(2, 5);
    for (int i = 5; i < 9; ++i) {
        int n = tally.get(i);
        if (0 < n) {
            System.out.println("found " + tally.get(i) + " sums of " + i);
        }
    }

produces the results you specified:

found 2 sums of 5
found 1 sums of 6

Here's the Tally code:

public static class Tally<T> {
    private Map<T,Integer> tally = new HashMap<T,Integer>();
    public Tally() {/* nothing */}
    public void inc(T key) {
        Integer value = tally.get(key);
        if (value == null) {
            value = Integer.valueOf(0);
        }
        tally.put(key, (value + 1));
    }
    public int get(T key) {
        Integer result = tally.get(key);
        return result == null ? 0 : result;
    }
    public Collection<T> keys() {
        return tally.keySet();
    }
}
倒数 2024-07-20 11:51:19

我编写了一个类来处理处理二项式系数的常用函数,这是您的问题所属的问题类型。 它执行以下任务:

  1. 以良好的格式将任意 N 选择 K 的所有 K 索引输出到文件中。 K 索引可以替换为更具描述性的字符串或字母。 这种方法使得解决此类问题变得非常简单。

  2. 将 K 索引转换为排序二项式系数表中条目的正确索引。 该技术比依赖迭代的旧发布技术要快得多。 它通过使用帕斯卡三角形固有的数学属性来实现这一点。 我的论文谈到了这一点。 我相信我是第一个发现并发布此技术的人,但我可能是错的。

  3. 将排序二项式系数表中的索引转换为相应的 K 索引。

  4. 使用Mark Dominus方法来计算二项式系数,这种方法不太可能溢出并适用于更大的数字。

  5. 该类是用 .NET C# 编写的,并提供了一种使用通用列表来管理与问题相关的对象(如果有)的方法。 该类的构造函数采用一个名为 InitTable 的布尔值,当该值为 true 时,将创建一个通用列表来保存要管理的对象。 如果该值为 false,则不会创建该表。 执行上述 4 种方法不需要创建该表。 提供了访问器方法来访问表。

  6. 有一个关联的测试类,它显示了如何使用该类及其方法。 它已经过 2 个案例的广泛测试,没有已知的错误。

要了解此类并下载代码,请参阅制表二项式系数

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:

  1. Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.

  2. Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.

  3. Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.

  4. Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.

  5. The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.

  6. There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

无畏 2024-07-20 11:51:19

在 stackoverflow 中查看这个问题:返回所有组合的算法

我也只是使用下面的java代码来生成所有排列,但它可以很容易地用于生成给定索引的唯一组合。

public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation

    int factorial = 1;

    for(int i = 2; i < s.length; i++)
        factorial *= i;//calculates the factorial of (s.length - 1)

    if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1] 
        return null;

    for(int i = 0; i < s.length - 1; i++){//go over the array

        int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
        E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time 

        for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
            s[j] = s[j-1];

        s[i] = temp;//put the chosen cell in the correct spot

        factorial /= (s.length - (i + 1));//updates the factorial

    }

    return s;
}

Check out this question in stackoverflow: Algorithm to return all combinations

I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.

public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation

    int factorial = 1;

    for(int i = 2; i < s.length; i++)
        factorial *= i;//calculates the factorial of (s.length - 1)

    if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1] 
        return null;

    for(int i = 0; i < s.length - 1; i++){//go over the array

        int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
        E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time 

        for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
            s[j] = s[j-1];

        s[i] = temp;//put the chosen cell in the correct spot

        factorial /= (s.length - (i + 1));//updates the factorial

    }

    return s;
}
梦罢 2024-07-20 11:51:19

我非常抱歉(在评论中进行了所有这些澄清之后)我无法找到解决此问题的有效方法。 我尝试了过去一个小时但没有结果。

原因(我认为)是这个问题与旅行商问题非常相似。 除非您尝试所有组合,否则无法知道哪些属性将达到阈值。

似乎没有什么巧妙的技巧可以解决这类问题。

您仍然可以对实际代码进行许多优化。

尝试根据属性对数据进行排序。 当您发现较高的值无法满足阈值时,您可以避免处理列表中的某些值(因此可以消除所有较低的值)。

I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.

The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.

There seems to be no clever trick that can solve this class of problems.

Still there are many optimizations that you can do to the actual code.

Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).

飘逸的'云 2024-07-20 11:51:19

如果您使用 C#,此处有一个相当不错的泛型库。 但请注意,某些排列的生成不按字典顺序排列

If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order

痴情 2024-07-20 11:51:19

下面是计算这些子集数量的递归方法:我们定义一个函数 count(minIndex,numElements,minSum),它返回大小为 numElements 的子集数量 其总和至少为 minSum,包含索引为 minIndex 或更大的元素。

正如问题陈述中一样,我们按降序对元素进行排序,例如 [3,3,2,1],并将第一个索引称为零,元素总数为 N。我们假设所有元素都是非负的。 要查找总和至少为 5 的所有 2 子集,我们调用 count(0,2,5)

示例代码 (Java):

int count(int minIndex, int numElements, int minSum)
{
    int total = 0;

    if (numElements == 1)
    {
        // just count number of elements >= minSum
        for (int i = minIndex; i <= N-1; i++)
            if (a[i] >= minSum) total++; else break;
    }
    else
    {
        if (minSum <= 0)
        {
            // any subset will do (n-choose-k of them)
            if (numElements <= (N-minIndex))
                total = nchoosek(N-minIndex, numElements);
        }
        else
        {
            // add element a[i] to the set, and then consider the count
            // for all elements to its right
            for (int i = minIndex; i <= (N-numElements); i++)
                total += count(i+1, numElements-1, minSum-a[i]);
        }
    }

    return total;
}

顺便说一句,我已经使用包含 40 个元素和大小为 8 的子集的数组运行了上面的代码,并且始终在不到一秒的时间内返回结果。

Here's a recursive approach to count the number of these subsets: We define a function count(minIndex,numElements,minSum) that returns the number of subsets of size numElements whose sum is at least minSum, containing elements with indices minIndex or greater.

As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call count(0,2,5).

Sample Code (Java):

int count(int minIndex, int numElements, int minSum)
{
    int total = 0;

    if (numElements == 1)
    {
        // just count number of elements >= minSum
        for (int i = minIndex; i <= N-1; i++)
            if (a[i] >= minSum) total++; else break;
    }
    else
    {
        if (minSum <= 0)
        {
            // any subset will do (n-choose-k of them)
            if (numElements <= (N-minIndex))
                total = nchoosek(N-minIndex, numElements);
        }
        else
        {
            // add element a[i] to the set, and then consider the count
            // for all elements to its right
            for (int i = minIndex; i <= (N-numElements); i++)
                total += count(i+1, numElements-1, minSum-a[i]);
        }
    }

    return total;
}

Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文