Java中获取集合的幂集
{1, 2, 3}
的幂集为:
{{}, {2}, {3}, {2, 3}, {1, 2}, {1, 3 }, {1, 2, 3}, {1}}
假设我有一个 Java Set
:
Set<Integer> mySet = new HashSet<Integer>();
mySet.add(1);
mySet.add(2);
mySet.add(3);
Set<Set<Integer>> powerSet = getPowerset(mySet);
如何以最佳的复杂度顺序编写函数 getPowerset? (我认为可能是 O(2^n)。)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(28)
是的,确实是
O(2^n)
,因为您需要生成2^n
可能的组合。这是一个使用泛型和集合的有效实现:以及一个测试,给出您的示例输入:
Yes, it is
O(2^n)
indeed, since you need to generate, well,2^n
possible combinations. Here's a working implementation, using generics and sets:And a test, given your example input:
实际上,我已经编写了在 O(1) 中完成您所要求的代码。问题是您接下来打算用该集合做什么。如果您只是要对其调用
size()
,那么时间复杂度为 O(1),但如果您要迭代它,则显然是O(2^n)
。contains()
将是O(n)
等。您真的需要这个吗?
编辑:
此代码是现已在 Guava 中提供,通过
Sets.powerSet(set)
。Actually, I've written code that does what you're asking for in O(1). The question is what you plan to do with the Set next. If you're just going to call
size()
on it, that's O(1), but if you're going to iterate it that's obviouslyO(2^n)
.contains()
would beO(n)
, etc.Do you really need this?
EDIT:
This code is now available in Guava, exposed through the method
Sets.powerSet(set)
.这是我使用生成器的解决方案,优点是,整个幂集永远不会立即存储......因此您可以逐一迭代它,而不需要将其存储在内存中。我想认为这是一个更好的选择...请注意,复杂性是相同的,O(2^n),但内存需求减少了(假设垃圾收集器的行为!;))
要调用它,请使用此模式:
它来自我的 Project Euler Library...:)
Here's a solution where I use a generator, the advantage being, the entire power set is never stored at once... So you can iterate over it one-by-one without needing it to be stored in memory. I'd like to think it's a better option... Note the complexity is the same, O(2^n), but the memory requirements are reduced (assuming the garbage collector behaves! ;) )
To call it, use this pattern:
It's from my Project Euler Library... :)
如果 n < 63,这是一个合理的假设,因为无论如何尝试构建幂集都会耗尽内存(除非使用迭代器实现),这是一种更简洁的方法。二元运算比 Math.pow() 和掩码数组快得多,但不知何故 Java 用户害怕它们......
If n < 63, which is a reasonable assumption since you'd run out of memory (unless using an iterator implementation) trying to construct the power set anyway, this is a more concise way to do it. Binary operations are way faster than
Math.pow()
and arrays for masks, but somehow Java users are afraid of them...这里是一个教程,详细描述了你想要的,包括代码。你是对的,复杂度是 O(2^n)。
Here is a tutorial describing exactly what you want, including the code. You're correct in that the complexity is O(2^n).
我根据@Harry He 的想法提出了另一个解决方案。可能不是最优雅的,但按照我的理解,它是这样的:
让我们以经典的简单示例 SP(S) = {{1},{2},{3}} 的 PowerSet 为例。
我们知道获取子集数量的公式是2^n(7 + 空集)。
对于此示例,2^3 = 8 个子集。
为了找到每个子集,我们需要将 0-7 十进制转换为二进制表示形式,如下转换表所示:
If我们逐行遍历表,每一行都会产生一个子集,每个子集的值将来自启用的位。
Bin Value 部分中的每一列对应于原始输入集中的索引位置。
这是我的代码:
I came up with another solution based on @Harry He's ideas. Probably not the most elegant but here it goes as I understand it:
Let's take the classical simple example PowerSet of S P(S) = {{1},{2},{3}}.
We know the formula to get the number of subsets is 2^n (7 + empty set).
For this example 2^3 = 8 subsets.
In order to find each subset we need to convert 0-7 decimal to binary representation shown in the conversion table below:
If we traverse the table row by row, each row will result in a subset and the values of each subset will come from the enabled bits.
Each column in the Bin Value section corresponds to the index position in the original input Set.
Here my code:
如果您使用 Eclipse Collections (以前的 GS Collections),您可以在所有 SetIterables 上使用
powerSet()
方法。注意:我是 Eclipse Collections 的提交者。
If you're using Eclipse Collections (formerly GS Collections), you can use the
powerSet()
method on all SetIterables.Note: I am a committer for Eclipse Collections.
我正在寻找一个不像这里发布的解决方案那么大的解决方案。这是针对 Java 7 的,因此需要对版本 5 和 6 进行一些粘贴。
以下是一些要测试的示例代码:
I was looking for a solution that wasn't as huge as the ones posted here. This targets Java 7, so it will require a handful of pastes for versions 5 and 6.
Here's some example code to test:
这是一个简单的迭代 O(2^n) 解决方案:
Here is an easy iterative O(2^n) solution:
当集合的大小很大时,上面的一些解决方案会受到影响,因为它们会创建大量要收集的对象垃圾并需要复制数据。我们怎样才能避免这种情况呢?我们可以利用这样一个事实:我们知道结果集大小有多大(2^n),预先分配一个那么大的数组,然后追加到它的末尾,而不是复制。
加速比随 n 快速增长。我将其与上面 João Silva 的解决方案进行了比较。在我的机器上(所有测量值均为近似值),n=13 快 5 倍,n=14 快 7 倍,n=15 快 12 倍,n=16 快 25 倍,n=17 快 75 倍,n=18 快 140 倍。因此,垃圾创建/收集和复制在看似类似的大 O 解决方案中占据主导地位。
与让它动态增长相比,在开始时预分配数组似乎是一个胜利。当 n=18 时,动态生长的总体时间大约是原来的两倍。
Some of the solutions above suffer when the size of the set is large because they are creating a lot of object garbage to be collected and require copying data. How can we avoid that? We can take advantage of the fact that we know how big the result set size will be (2^n), preallocate an array that big, and just append to the end of it, never copying.
The speedup grows quickly with n. I compared it to João Silva's solution above. On my machine (all measurements approximate), n=13 is 5x faster, n=14 is 7x, n=15 is 12x, n=16 is 25x, n=17 is 75x, n=18 is 140x. So that garbage creation/collection and copying is dominating in what otherwise seem to be similar big-O solutions.
Preallocating the array at the beginning appears to be a win compared to letting it grow dynamically. With n=18, dynamic growing takes about twice as long overall.
以下解决方案借用自我的书《编码面试:问题、分析和解决方案":
选择数组中的一些整数组成组合。使用一组位,其中每个位代表数组中的一个整数。如果组合中选择了第i个字符,则第i位为1;例如,数组[1,2,3]的组合使用三位。如果选择前两个整数1和2组成组合[1, 2],则对应的比特为{1, 1, 0}。类似地,另一个组合[1, 3]对应的比特为{1, 0, 1}。如果我们能够获得 n 位的所有可能组合,那么我们就能够获得长度为 n 的数组的所有组合。
数字由一组位组成。 n 位的所有可能组合都对应于数字
从 1 到 2^n-1。因此,1 到 2^n-1 之间的每个数字都对应于长度为 n 的数组的组合。例如数字6是由位{1, 1, 0}组成,因此选择数组[1, 2, 3]中的第一个和第二个字符来生成组合[1, 2]。类似地,具有位{1,0,1}的数字5对应于组合[1,3]。
实现此解决方案的 Java 代码如下所示:
方法增量增加以一组位表示的数字。该算法清除 1 位
从最右边的位开始,直到找到 0 位为止。然后将最右边的 0 位设置为 1。例如,为了用位 {1, 0, 1} 增加数字 5,它从右侧清除 1 位并将最右边的 0 位设置为 1。这些位变为{1, 1, 0} 表示数字 6,它是 5 加 1 的结果。
The following solution is borrowed from my book "Coding Interviews: Questions, Analysis & Solutions":
Some integers in an array are selected that compose a combination. A set of bits is utilized, where each bit stands for an integer in the array. If the i-th character is selected for a combination, the i-th bit is 1; otherwise, it is 0. For instance, three bits are used for combinations of the array [1, 2, 3]. If the first two integers 1 and 2 are selected to compose a combination [1, 2], the corresponding bits are {1, 1, 0}. Similarly, bits corresponding to another combination [1, 3] are {1, 0, 1}. We are able to get all combinations of an array with length n if we can get all possible combinations of n bits.
A number is composed of a set of bits. All possible combinations of n bits correspond to numbers
from 1 to 2^n-1. Therefore, each number in the range between 1 and 2^n-1 corresponds to a combination of an array with length n. For example, the number 6 is composed of bits {1, 1, 0}, so the first and second characters are selected in the array [1, 2, 3] to generate the combination [1, 2]. Similarly, the number 5 with bits {1, 0, 1} corresponds to the combination [1, 3].
The Java code to implement this solution looks like below:
The method increment increases a number represented in a set of bits. The algorithm clears 1 bits
from the rightmost bit until a 0 bit is found. It then sets the rightmost 0 bit to 1. For example, in order to increase the number 5 with bits {1, 0, 1}, it clears 1 bits from the right side and sets the rightmost 0 bit to 1. The bits become {1, 1, 0} for the number 6, which is the result of increasing 5 by 1.
如果 S 是包含 N 个元素的有限集,则 S 的幂集包含 2^N 个元素。简单枚举幂集元素的时间为 2^N,因此
O(2^N)
是(急切地)构造幂集的时间复杂度的下限。简而言之,任何涉及创建幂集的计算都不会针对较大的 N 值进行扩展。没有聪明的算法可以帮助您......除了避免创建幂集的需要之外!
If S is a finite set with N elements, then the power set of S contains 2^N elements. The time to simply enumerate the elements of the powerset is 2^N, so
O(2^N)
is a lower bound on the time complexity of (eagerly) constructing the powerset.Put simply, any computation that involves creating powersets is not going to scale for large values of N. No clever algorithm will help you ... apart from avoiding the need to create the powersets!
一种不使用递归的方法如下:使用二进制掩码并进行所有可能的组合。
One way without recursion is the following: Use a binary mask and make all the possible combinations.
算法:
输入:Set[], set_size
1. 获取幂集大小
poweret_set_size = pow(2, set_size)
2 循环计数器从 0 到 pow_set_size
(a) 循环 i = 0 至 set_size
(i) 如果计数器中的第 i 位被设置
打印该子集集合中的第 i 个元素
(b) 子集的打印分隔符,即换行符
Algorithm:
Input: Set[], set_size
1. Get the size of power set
powet_set_size = pow(2, set_size)
2 Loop for counter from 0 to pow_set_size
(a) Loop for i = 0 to set_size
(i) If ith bit in counter is set
Print ith element from set for this subset
(b) Print seperator for subsets i.e., newline
这是我的递归解决方案,它可以使用 Java 泛型获取任何集合的幂集。它的主要思想是将输入数组的头部与数组其余部分的所有可能的解决方案组合起来,如下所示。
这将输出:
This is my recursive solution which can get the power set of any set using Java Generics. Its main idea is to combine the head of the input array with all the possible solutions of the rest of the array as follows.
This will output:
另一个示例实现:
Another sample implementation:
这是我使用 lambda 的方法。
或者并行(参见parallel()注释):
输入集大小:18个
逻辑处理器:8个3.4GHz
性能改进:30%
This is my approach with lambdas.
Or in parallel (see parallel() comment):
Size of input set: 18
Logical processors: 8 à 3.4GHz
Performance improvement: 30%
我最近不得不使用类似的东西,但首先需要最小的子列表(有 1 个元素,然后是 2 个元素,...)。我不想包含空列表或整个列表。
另外,我不需要返回的所有子列表的列表,我只需要对每个子列表做一些事情。
想要在没有递归的情况下完成此操作,并提出了以下方案(将“做的事情”抽象为函数接口):
通过这种方式,也很容易将其限制为特定长度的子列表。
I recently had to use something like this, but needed the smallest sublists (with 1 element, then 2 elements, ...) first. I did not want to include the empty nor the whole list.
Also, I did not need a list of all the sublists returned, I just needed to do some stuff with each.
Wanted to do this without recursion, and came up with the following (with the "doing stuff" abstracted into a functional interface):
In this way, it's also easy to limit it to sublists of specific lengths.
另一种解决方案 - 使用 java8+ 流 api
它是惰性的并且是有序的,因此当它与“limit()”一起使用时它会返回正确的子集。
客户端代码为
/* Prints : [][a][b][c][d][e][a, b][a, c][b, c] */
Yet another solution - with java8+ streaming api
It is lazy and ordered so it returns correct subsets when it is used with "limit()".
And the client code is
/* Prints : [][a][b][c][d][e][a, b][a, c][b, c] */
我们可以使用或不使用递归来编写幂集。这是一个没有递归的尝试:
We could write the power set with or without using recursion. Here is an attempt without recursion:
t 的子集是可以通过删除 t 的零个或多个元素而得到的任何集合。 withoutFirst 子集添加 t 中缺少第一个元素的子集,而 for 循环将处理添加第一个元素的子集。例如,如果 t 包含元素 ["1", "2", "3"],missingFirst 将添加 [[""],
["2"], ["3"], ["2","3"]] 和 for 循环会将“1”粘贴在这些元素前面并将其添加到 newSet 中。所以我们最终会得到 [[""], ["1"], ["2"], ["3"], ["1", "2"], ["1", "3"] 、[“2”、“3”]、[“1”、“2”、“3”]]。
A sub-set of t is any set that can be made by removing zero or more elements of t. The withoutFirst subset adds the subsets of t that are missing the first element and the for loop will deal with adding subsets with the first element. For example, if t contained the elements ["1", "2", "3"], missingFirst will add [[""],
["2"], ["3"], ["2","3"]] and the for loop will stick the "1" in front of these element and add it to the newSet. So we'll end up with [[""], ["1"], ["2"], ["3"], ["1", "2"], ["1", "3"], ["2","3"], ["1", "2", "3"]].
这里是生成一个幂集。这个想法是first =
S[0]
,较小的集合是S[1,...n]
。计算smallerSet的所有子集并将它们放入allsubsets中。
对于 allsubsets 中的每个子集,克隆它并首先添加到子集中。
Here is to generate a power set. The idea is first =
S[0]
and smaller sets beS[1,...n]
.Compute all subsets of smallerSet and put them in allsubsets.
For each subsets in allsubsets, clone it and add first to the subset.
该函数通过递归解决了这个问题,但将名为 powerset 的变量作为全局变量:
This function solved this problem by recursion but make variable named powerset as a Global Variable:
只需使用递归即可。
Just use recursion.