生成按属性排序的组合
我正在寻找一种方法来生成按单个属性排序的对象组合。 我不认为字典顺序是我正在寻找的......我会尝试举一个例子。 假设我有一个对象 A、B、C、D 列表,其中我想要按 3、3、2、1 排序的属性值。 这给出了 A3、B3、C2、D1 对象。 现在我想生成2个对象的组合,但它们需要按降序排列:
- A3 B3
- A3 C2
- B3 C2
- A3 D1
- B3 D1
- C2 D1
生成所有组合并对它们进行排序是不可接受的,因为现实世界场景涉及大集合以及数以百万计的组合。 (一组 40 个,顺序为 8 个),我只需要高于特定阈值的组合。
实际上,我需要对高于阈值的组合进行计数,并按给定属性的总和进行分组,但我认为这要困难得多 - 所以我会满足于开发高于阈值的所有组合并对它们进行计数。 如果可能的话。
编辑 - 我原来的问题不是很精确......我实际上不需要订购这些组合,只是认为这将有助于隔离高于阈值的组合。 更准确地说,在上面的示例中,给定阈值 5,我正在寻找给定集合产生 1 个总和为 6 ( A3 B3 ) 的组合和 2 个总和为 5 ( A3 C2 ) 的信息, B3 C2)。 我实际上不需要组合本身。
我正在研究子集和问题,但如果我正确理解给定的动态解决方案,它只会向您提供是否有给定总和的信息,而不是总和的计数。
谢谢
I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:
- A3 B3
- A3 C2
- B3 C2
- A3 D1
- B3 D1
- C2 D1
Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.
Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.
EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.
I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
实际上,我认为您确实想要字典顺序,但是降序而不是升序。 另外:
我稍后会发布示例代码。这是我承诺的示例代码,下面有一些注释:
前言注释:
这使用了一个名为 Tally 的小辅助类,它只是隔离制表(包括从不初始化) -之前见过的钥匙)。 我会把它放在最后。
为了保持简洁,我采取了一些对于“真实”代码来说不是很好的做法的快捷方式:
count
的私有方法之间的参数传递。 这使得该类成为非线程安全的。说明:
使用要组合的(降序排列)整数数组创建
Combos
实例。value
数组每个实例设置一次,但可以根据不同的群体大小和限制对count
进行多次调用。count
方法触发对values
中n
个整数的唯一组合的(大部分)标准递归遍历。limit
参数给出了利息总和的下限。countAt
方法检查values
中的整数组合。left
参数是剩余多少个整数来组成总和中的n
个整数,start
是values
中的位置> 从中搜索,sum
是部分和。早期纾困机制基于计算
best
,这是一个二维数组,指定从给定状态可达到的“最佳”总和。best[n][p]
中的值是从原始值<的位置
p
开始的n
个值的最大和。 /代码>。当正确的总体累积完毕后,
countAt
的递归就会触底; 这会将当前的总和
(n
值)添加到tally
中。 如果countAt
尚未触底,它会从start
-ing 位置扫描values
以增加当前的部分sum
>,只要:values
中保留足够的位置来实现指定的总体,并且最佳
(最大)小计足够大以达到限制< /代码>。
使用您的问题数据运行示例:
产生您指定的结果:
这是 Tally 代码:
Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:
I'll post sample code later.Here's the sample code I promised, with a few remarks following:
Preface remarks:
This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.
To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:
count
. That makes this class non-thread-safe.Explanation:
An instance of
Combos
is created with the (descending ordered) array of integers to combine. Thevalue
array is set up once per instance, but multiple calls tocount
can be made with varying population sizes and limits.The
count
method triggers a (mostly) standard recursive traversal of unique combinations ofn
integers fromvalues
. Thelimit
argument gives the lower bound on sums of interest.The
countAt
method examines combinations of integers fromvalues
. Theleft
argument is how many integers remain to make upn
integers in a sum,start
is the position invalues
from which to search, andsum
is the partial sum.The early-bail-out mechanism is based on computing
best
, a two-dimensional array that specifies the "best" sum reachable from a given state. The value inbest[n][p]
is the largest sum ofn
values beginning in positionp
of the originalvalues
.The recursion of
countAt
bottoms out when the correct population has been accumulated; this adds the currentsum
(ofn
values) to thetally
. IfcountAt
has not bottomed out, it sweeps thevalues
from thestart
-ing position to increase the current partialsum
, as long as:values
to achieve the specified population, andbest
(largest) subtotal remaining is big enough to make thelimit
.A sample run with your question's data:
produces the results you specified:
Here's the Tally code:
我编写了一个类来处理处理二项式系数的常用函数,这是您的问题所属的问题类型。 它执行以下任务:
以良好的格式将任意 N 选择 K 的所有 K 索引输出到文件中。 K 索引可以替换为更具描述性的字符串或字母。 这种方法使得解决此类问题变得非常简单。
将 K 索引转换为排序二项式系数表中条目的正确索引。 该技术比依赖迭代的旧发布技术要快得多。 它通过使用帕斯卡三角形固有的数学属性来实现这一点。 我的论文谈到了这一点。 我相信我是第一个发现并发布此技术的人,但我可能是错的。
将排序二项式系数表中的索引转换为相应的 K 索引。
使用Mark Dominus方法来计算二项式系数,这种方法不太可能溢出并适用于更大的数字。
该类是用 .NET C# 编写的,并提供了一种使用通用列表来管理与问题相关的对象(如果有)的方法。 该类的构造函数采用一个名为 InitTable 的布尔值,当该值为 true 时,将创建一个通用列表来保存要管理的对象。 如果该值为 false,则不会创建该表。 执行上述 4 种方法不需要创建该表。 提供了访问器方法来访问表。
有一个关联的测试类,它显示了如何使用该类及其方法。 它已经过 2 个案例的广泛测试,没有已知的错误。
要了解此类并下载代码,请参阅制表二项式系数。
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
在 stackoverflow 中查看这个问题:返回所有组合的算法
我也只是使用下面的java代码来生成所有排列,但它可以很容易地用于生成给定索引的唯一组合。
Check out this question in stackoverflow: Algorithm to return all combinations
I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.
我非常抱歉(在评论中进行了所有这些澄清之后)我无法找到解决此问题的有效方法。 我尝试了过去一个小时但没有结果。
原因(我认为)是这个问题与旅行商问题非常相似。 除非您尝试所有组合,否则无法知道哪些属性将达到阈值。
似乎没有什么巧妙的技巧可以解决这类问题。
您仍然可以对实际代码进行许多优化。
尝试根据属性对数据进行排序。 当您发现较高的值无法满足阈值时,您可以避免处理列表中的某些值(因此可以消除所有较低的值)。
I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.
The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.
There seems to be no clever trick that can solve this class of problems.
Still there are many optimizations that you can do to the actual code.
Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).
如果您使用 C#,此处有一个相当不错的泛型库。 但请注意,某些排列的生成不按字典顺序排列
If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order
下面是计算这些子集数量的递归方法:我们定义一个函数
count(minIndex,numElements,minSum)
,它返回大小为numElements 的子集数量
其总和至少为minSum
,包含索引为minIndex
或更大的元素。正如问题陈述中一样,我们按降序对元素进行排序,例如 [3,3,2,1],并将第一个索引称为零,元素总数为 N。我们假设所有元素都是非负的。 要查找总和至少为 5 的所有 2 子集,我们调用
count(0,2,5)
。示例代码 (Java):
顺便说一句,我已经使用包含 40 个元素和大小为 8 的子集的数组运行了上面的代码,并且始终在不到一秒的时间内返回结果。
Here's a recursive approach to count the number of these subsets: We define a function
count(minIndex,numElements,minSum)
that returns the number of subsets of sizenumElements
whose sum is at leastminSum
, containing elements with indicesminIndex
or greater.As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call
count(0,2,5)
.Sample Code (Java):
Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.