具有唯一值的排列
itertools.permutations 生成的元素根据其位置而不是其值被视为唯一。所以基本上我想避免像这样的重复:
>>> list(itertools.permutations([1, 1, 1]))
[(1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)]
事后过滤是不可能的,因为在我的情况下排列量太大。
有人知道合适的算法吗?
非常感谢!
编辑:
我基本上想要的是以下内容:
x = itertools.product((0, 1, 'x'), repeat=X)
x = sorted(x, key=functools.partial(count_elements, elem='x'))
这是不可能的,因为 sorted
创建一个列表并且 itertools.product 的输出太大。
抱歉,我应该描述实际问题。
itertools.permutations generates where its elements are treated as unique based on their position, not on their value. So basically I want to avoid duplicates like this:
>>> list(itertools.permutations([1, 1, 1]))
[(1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)]
Filtering afterwards is not possible because the amount of permutations is too large in my case.
Does anybody know of a suitable algorithm for this?
Thank you very much!
EDIT:
What I basically want is the following:
x = itertools.product((0, 1, 'x'), repeat=X)
x = sorted(x, key=functools.partial(count_elements, elem='x'))
which is not possible because sorted
creates a list and the output of itertools.product is too large.
Sorry, I should have described the actual problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
因为有时新问题被标记为重复项,并且它们的作者被引用到这个问题,所以可能需要提及 sympy 有一个用于此目的的迭代器。
Because sometimes new questions are marked as duplicates and their authors are referred to this question it may be important to mention that sympy has an iterator for this purpose.
结果:
编辑(这是如何工作的):
我重写了上面的程序,使其更长但更具可读性。
我通常很难解释某些东西是如何工作的,但让我尝试一下。
为了理解它是如何工作的,你必须理解一个类似但更简单的程序,它会产生所有重复的排列。
这个程序显然要简单得多:
d 代表 permutations_helper 中的深度,有两个函数。一个函数是递归算法的停止条件,另一个函数是传递的结果列表。
我们不是返回每个结果,而是产生它。如果没有函数/运算符
yield
,我们就必须在停止条件时将结果推送到某个队列中。但这样,一旦满足停止条件,结果就会通过所有堆栈传播到调用者。这就是的目的
for g in perm_unique_helper(listunique,result_list,d-1): 产量 g
因此每个结果都会传播给调用者。
回到原来的程序:
我们有一个独特元素的列表。在我们可以使用每个元素之前,我们必须检查其中有多少元素仍然可以推送到 result_list 上。使用此程序与
permutations_with_replacement
非常相似。不同之处在于每个元素的重复次数不能超过 perm_unique_helper 中的重复次数。result:
EDIT (how this works):
I rewrote the above program to be longer but more readable.
I usually have a hard time explaining how something works, but let me try.
In order to understand how this works, you have to understand a similar but simpler program that would yield all permutations with repetitions.
This program is obviously much simpler:
d stands for depth in permutations_helper and has two functions. One function is the stopping condition of our recursive algorithm, and the other is for the result list that is passed around.
Instead of returning each result, we yield it. If there were no function/operator
yield
we would have to push the result in some queue at the point of the stopping condition. But this way, once the stopping condition is met, the result is propagated through all stacks up to the caller. That is the purpose offor g in perm_unique_helper(listunique,result_list,d-1): yield g
so each result is propagated up to caller.
Back to the original program:
we have a list of unique elements. Before we can use each element, we have to check how many of them are still available to push onto result_list. Working with this program is very similar to
permutations_with_replacement
. The difference is that each element cannot be repeated more times than it is in perm_unique_helper.这依赖于实现细节,即已排序可迭代的任何排列都按排序顺序排列,除非它们是先前排列的重复项。
给出
This relies on the implementation detail that any permutation of a sorted iterable are in sorted order unless they are duplicates of prior permutations.
gives
大致与 Luka Rahne 的回答一样快,但更短且更短。更简单,恕我直言。
它通过设置第一个元素(迭代所有唯一元素)并迭代所有剩余元素的排列来递归地工作。
让我们浏览一下 (1,2,3,1) 的
unique_permutations
来看看它是如何工作的:unique_elements
是 1,2,3first_element
从 1 开始。remaining_elements
为 [2,3,1](即 1,2,3,1 减去前 1)sub_permutation
,我们插入first_element
:(1,1,2,3), (1,1,3,2), ...并得出结果。first_element
= 2,并执行与上面相同的操作。remaining_elements
为 [1,3,1](即 1,2,3,1 减去前 2)sub_permutation
,我们插入first_element
:(2, 1, 1, 3), (2 strong>, 1, 3, 1), (2, 3, 1, 1)...并得出结果。first_element
= 3 执行相同的操作。Roughly as fast as Luka Rahne's answer, but shorter & simpler, IMHO.
It works recursively by setting the first element (iterating through all unique elements), and iterating through the permutations for all remaining elements.
Let's go through the
unique_permutations
of (1,2,3,1) to see how it works:unique_elements
are 1,2,3first_element
starts with 1.remaining_elements
are [2,3,1] (ie. 1,2,3,1 minus the first 1)sub_permutation
, we insert thefirst_element
: (1,1,2,3), (1,1,3,2), ... and yield the result.first_element
= 2, and do the same as above.remaining_elements
are [1,3,1] (ie. 1,2,3,1 minus the first 2)sub_permutation
, we insert thefirst_element
: (2, 1, 1, 3), (2, 1, 3, 1), (2, 3, 1, 1)... and yield the result.first_element
= 3.一种幼稚的方法可能是采用一组排列:
然而,这种技术浪费地计算重复排列并丢弃它们。更有效的方法是
more_itertools.distinct_permutations
,第三方工具。代码
性能
使用更大的迭代,我们将比较简单技术和第三方技术之间的性能。
我们看到
more_itertools.distinct_permutations
速度快了一个数量级。详细信息
从源头来看,递归算法(如接受的答案中所示)用于计算不同的排列,从而避免浪费计算。有关更多详细信息,请参阅源代码。
A naive approach might be to take the set of the permutations:
However, this technique wastefully computes replicate permutations and discards them. A more efficient approach would be
more_itertools.distinct_permutations
, a third-party tool.Code
Performance
Using a larger iterable, we will compare the performances between the naive and third-party techniques.
We see
more_itertools.distinct_permutations
is an order of magnitude faster.Details
From the source, a recursion algorithm (as seen in the accepted answer) is used to compute distinct permutations, thereby obviating wasteful computations. See the source code for more details.
您可以尝试使用 set:
对 set 的调用删除了重复项
You could try using set:
The call to set removed duplicates
这是我的解决方案,有 10 行:
--- 结果 ----
This is my solution with 10 lines:
--- Result ----
我见过的这个问题的最佳解决方案是使用 Knuth 的“算法 L”(正如 Gerrat 之前在原始帖子的评论中指出的那样):
<一href="http://stackoverflow.com/questions/12836385/how-can-i-interleave-or-create-unique-permutations-of-two-stings-without-recurs/12837695 ">http://stackoverflow.com/questions/12836385/how-can-i-interleave-or-create-unique-permutations-of-two-stings-without-recurs/12837695
一些时序:
排序
[1]*12+[0]*12
(2,704,156 个唯一排列):算法 L → 2.43 s
Luke Rahne 的解 → 8.56 s
scipy.multiset_permutations()
→ 16.8 秒The best solution to this problem I have seen uses Knuth's "Algorithm L" (as noted previously by Gerrat in the comments to the original post):
http://stackoverflow.com/questions/12836385/how-can-i-interleave-or-create-unique-permutations-of-two-stings-without-recurs/12837695
Some timings:
Sorting
[1]*12+[0]*12
(2,704,156 unique permutations):Algorithm L → 2.43 s
Luke Rahne's solution → 8.56 s
scipy.multiset_permutations()
→ 16.8 s这是该问题的递归解决方案。
Here is a recursive solution to the problem.
要生成
["A","B","C","D"]
的唯一排列,我使用以下内容:生成:
注意,不会创建重复项(例如与 < 组合的项目) code>D 不会生成,因为它们已经存在)。
示例:然后可以通过 Pandas 数据框中的数据用于生成 OLS 模型的高阶或低阶项。
创建...
然后可以通过管道传输到您的 OLS 回归
To generate unique permutations of
["A","B","C","D"]
I use the following:Which generates:
Notice, duplicates are not created (e.g. items in combination with
D
are not generated, as they already exist).Example: This can then be used in generating terms of higher or lower order for OLS models via data in a Pandas dataframe.
Creates...
Which can then be piped to your OLS regression
听起来您正在寻找 itertools.combinations() docs.python.org
It sound like you are looking for itertools.combinations() docs.python.org
我自己在寻找东西时遇到了这个问题!
这就是我所做的:
基本上,制作一组并不断添加。比制作占用太多内存的列表等更好。
希望它对下一个人有所帮助:-) 注释掉函数中设置的“更新”以查看差异。
Bumped into this question while looking for something myself !
Here's what I did:
Basically, make a set and keep adding to it. Better than making lists etc. that take too much memory..
Hope it helps the next person looking out :-) Comment out the set 'update' in the function to see the difference.
您可以创建一个函数,使用
collections.Counter
从给定序列中获取唯一项及其计数,并使用itertools.combinations
为每个唯一项选择索引组合在每个递归调用中,并在选取所有索引时将索引映射回列表:以便
[''.join(i) for i in unique_permutations('moon')]
返回:You can make a function that uses
collections.Counter
to get unique items and their counts from the given sequence, and usesitertools.combinations
to pick combinations of indices for each unique item in each recursive call, and map the indices back to a list when all indices are picked:so that
[''.join(i) for i in unique_permutations('moon')]
returns:这是我的尝试,不诉诸 set / dict,作为使用递归的生成器,而是使用字符串作为输入。输出也按自然顺序排序:
示例:
This is my attempt without resorting to set / dict, as a generator using recursion, but using string as input. Output is also ordered in natural order:
example:
https://www.geeksforgeeks.org/heaps-algorithm-for-generate -排列/
https://www.geeksforgeeks.org/heaps-algorithm-for-generating-permutations/
前几天在解决我自己的问题时遇到了这个问题。我喜欢 Luka Rahne 的方法,但我认为在集合库中使用 Counter 类似乎是一个适度的改进。这是我的代码:
此代码将每个排列作为列表返回。如果你给它一个字符串,它会给你一个排列列表,其中每个排列都是一个字符列表。如果您希望输出为字符串列表(例如,如果您是一个糟糕的人,并且您想滥用我的代码来帮助您在拼字游戏中作弊),只需执行以下操作:
Came across this the other day while working on a problem of my own. I like Luka Rahne's approach, but I thought that using the Counter class in the collections library seemed like a modest improvement. Here's my code:
This code returns each permutation as a list. If you feed it a string, it'll give you a list of permutations where each one is a list of characters. If you want the output as a list of strings instead (for example, if you're a terrible person and you want to abuse my code to help you cheat in Scrabble), just do the following:
在这种情况下,我使用 itertools.product 提出了一个非常合适的实现(这是一个您想要所有组合的实现,
这本质上是一个组合(n over k),其中 n = X 和 somenumber = k
itertools.product() 从 k = 0 迭代到 k = X,随后使用 count 进行过滤,确保只有具有正确数量的排列才会被投射到列表中。当您计算 n over k 并将其与 len(unique_perm_list) 进行比较时,您可以轻松地看到它的工作原理
I came up with a very suitable implementation using itertools.product in this case (this is an implementation where you want all combinations
this is essentially a combination (n over k) with n = X and somenumber = k
itertools.product() iterates from k = 0 to k = X subsequent filtering with count ensures that just the permutations with the right number of ones are cast into a list. you can easily see that it works when you calculate n over k and compare it to the len(unique_perm_list)
适应于消除递归,使用字典和 numba 来获得高性能,但不使用yield/generator 样式,因此内存使用不受限制:
大约快 30%,但由于列表复制和管理仍然受到一些影响。
或者没有 numba 但仍然没有递归并使用生成器来避免内存问题:
Adapted to remove recursion, use a dictionary and numba for high performance but not using yield/generator style so memory usage is not limited:
About 30% faster but still suffers a bit due to list copying and management.
Alternatively without numba but still without recursion and using a generator to avoid memory issues:
也许我们可以在这里使用 set 来获得唯一的排列
May be we can use set here to obtain unique permutations
问题
是排列现在是 Numpy 数组的行,因此使用更多内存,但您可以像以前一样循环遍历它们
What about
The problem is the permutations are now rows of a Numpy array, thus using more memory, but you can cycle through them as before