对于一组整数,获取所有总和大于某些值的集合

发布于 2025-02-03 14:19:08 字数 1091 浏览 5 评论 0原文

首先,让我说我找到了许多问题和答案,这些问题和答案似乎有些相关或与这个问题有些相关,但实际上并不是这个问题。它似乎与硬币变更 /子集总和问题有关,但是我认为这完全不同,子集总和答案不涵盖这个问题。

问题

是说我有一组4个名为 s 的整数:[1,2,5,1]

目标是设计一个 g 的集集由 s 的每个子集组成,其中集合的总和大于某些值 n

在此示例中,如果 n = 6 ,则 g 将为[[5,2,1,1],[5,2,1],[ 5,2],[5,1,1]]

注意事项

我在这里选择了一个单词,因为每组的顺序根本无关紧要。 [5,2,1,1][5,1,1,2]相同。

当您创建超过 n 的第一组时,可以说 g 1 ,您也可以立即添加所有包含 g的所有其他集合 1 作为子集。同样,如果您找到了另一个新集 g 2 尚未添加并且也超过 n ,则可以立即添加所有包含<的集合强> g 2 作为子集。您无需检查这些集合是否总和大于 n

如果您总结了 s 的所有成员,并且结果不大于 n ,那么您可以得出结论,没有符合标准的集合。

实际用例

在这里试图实现的现实世界目标是我有一组具有与之相关的数值的项目,而另一个代表阈值的值。我试图找出是否可以例行地找到所有可以超过阈值的子组的子群,当它们的权重求和时。

生成符合此标准的所有集合的最有效方法是什么?这是什么复杂性?

Let me start by saying I have found many questions and answers that answer things which seem somewhat related or close to this question, but are not actually this question. It seems related to the coin change / subset sum problem, but I think this is distinctly different enough that subset sum answers do not cover this question.

The Problem

Let's say I have a set of 4 integers called S : [1, 2, 5, 1]

The goal is to devise a set of sets, G , that is comprised of each subset of S where the sum of the set is greater than some value N.

In this example, if N = 6, then G would be [ [5,2,1,1], [5,2,1], [5,2], [5,1,1] ]

Considerations

I chose the word set here specifically because the order of each set does not matter at all. [5, 2, 1, 1] is identical to [5, 1, 1, 2].

When you create your first set that exceeds N, lets say G1, you can also immediately add all other sets which contain G1 as a subset. Likewise, if you find another new set G2 that has not already added and also exceeds N, you can immediately add all sets which contain G2 as a subset. You don't need to check if these sets sum to greater than N.

If you sum all the members of S and the result is not greater than N, then you can conclude there are no sets which meet the criteria.

Actual Use-case

The real-world goal that is trying to be met here is that I have a group of items which have a numerical weight associated with them, and another value which represents a threshold. I'm trying find out if its even feasible to routinely find all the sub-groups that can be made of the items that exceed the threshold when their weights are summed.

What is the most efficient way to generate all sets that meet this criteria? What is complexity of this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无法言说的痛 2025-02-10 14:19:08

这是一个更有效的实施。值得注意的说明:

  • 它正确处理重复的元素以避免返回相同的子集
  • 实际上两次,它通过将同价值的元素分组为带有数量的垃圾箱,
  • 从而避免完全重新计算等效的子集在保持阈值之上和早期外观的同时,以避免完全探索较小的子集,
  • 它使用列表突变来避免使用昂贵的列表创建 /应对,直到绝对必要时
  • 它确实使用递归,因此在那里有少量的性能罚款,并且与所有递归一样:如果您向其投入足够大的输入,它会引起错误。如果成为问题的问题,则是对读者进行练习,以将其移植到迭代变体中
from collections import Counter

def solve(nums, target):
    counts = sorted(Counter(nums).items())
    reserve = sum(nums) - target
    
    if reserve <= 0:
        return []
    return list(_solve(counts, reserve, []))

def _solve(counts, reserve, prefix):
    if not counts:
        yield tuple(prefix)
        return
    
    val, max_count = last = counts.pop()
    
    prefix.extend([val] * max_count)
    yield from _solve(counts, reserve, prefix)
    
    for count in range(1, max_count + 1):
        prefix.pop()
        if reserve - count * val > 0:
            yield from _solve(counts, reserve - count * val, prefix)
    
    counts.append(last)

from timeit import timeit

runs = 10
statement = "solve([randrange(15) for _ in range(25)], 100)"
setup = "from __main__ import solve; from random import randrange"
print(timeit(statement, setup=setup, number=runs)/runs)

输出:这

0.22455865999218078

意味着它可以解决25个元素的问题,其中解决方案包含约225毫秒的〜100K不同的子集。

我确实针对天真的解决方案进行了测试,以确保正确性。

关于时间复杂性,很难绑定它,因为它的运行时间确实取决于数字的值(或者是输出大小)。您可以将其绑定为o(n log n + s * n)其中n是输入列表的大小,s是输出列表的大小。

Here's a more efficient implementation. Worthwhile notes:

  • It correctly handles duplicate elements to avoid returning the same subset twice
  • In fact, it avoids recomputing equivalent subsets altogether, by grouping same-valued elements into bins with counts
  • It starts from the largest subset (the set itself) and gradually removes elements while staying above the threshold, and early-exits to avoid exploring smaller subsets entirely
  • It uses list mutation to avoid expensive list creation / coping until absolutely necessary
  • It does use recursion, so there's a small performance penalty there, and as with all recursion: if you throw a large enough input at it, it will raise an error. Its an exercise to the reader to port this to an iterative variant if that becomes an issue
from collections import Counter

def solve(nums, target):
    counts = sorted(Counter(nums).items())
    reserve = sum(nums) - target
    
    if reserve <= 0:
        return []
    return list(_solve(counts, reserve, []))

def _solve(counts, reserve, prefix):
    if not counts:
        yield tuple(prefix)
        return
    
    val, max_count = last = counts.pop()
    
    prefix.extend([val] * max_count)
    yield from _solve(counts, reserve, prefix)
    
    for count in range(1, max_count + 1):
        prefix.pop()
        if reserve - count * val > 0:
            yield from _solve(counts, reserve - count * val, prefix)
    
    counts.append(last)

Timing it:

from timeit import timeit

runs = 10
statement = "solve([randrange(15) for _ in range(25)], 100)"
setup = "from __main__ import solve; from random import randrange"
print(timeit(statement, setup=setup, number=runs)/runs)

Output:

0.22455865999218078

Meaning its able to solve problems of 25 elements where the solution contains ~100k distinct subsets in about 225ms.

I did test this against a naive solution to ensure correctness as well.

Regarding time complexity, its hard to bound this, because its run time really depends on the value of the numbers (or rather, the output size). You could bound it as O(n log n + s * n) where n is the size of the input list, and s is the size of the output list.

梦明 2025-02-10 14:19:08

您可以使用itertools生成子列表,然后根据条件过滤这些列表:

import itertools

input_list = [1, 2, 5, 1]
threshold = 6

result = []

for length in range(len(input_list) + 1):

    search_space = list(itertools.combinations(input_list, length))

    for item in search_space:
        if sum(item) > threshold:
            result.append(item)

print(result)

时间复杂性很高,因为您首先需要生成每个长度的子清单(外部循环),然后根据条件(内部的循环)搜索正确的值,因此至少是O(n^2)。

You can use itertools to generate sub-lists and then filter those lists based on a condition:

import itertools

input_list = [1, 2, 5, 1]
threshold = 6

result = []

for length in range(len(input_list) + 1):

    search_space = list(itertools.combinations(input_list, length))

    for item in search_space:
        if sum(item) > threshold:
            result.append(item)

print(result)

Time complexity is quite high as you first need to generate sub-lists of every length (outer for loop), then search right values based on condition (inner for loop), so it will be at least O(n^2).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文