为什么 Python 的 itertools.permutations 包含重复项? (当原始列表有重复时)
人们普遍认为,n 个不同符号的列表有 n!排列。然而,当符号不明确时,数学和其他领域最常见的约定似乎是仅计算不同的排列。因此列表[1, 1, 2]
的排列通常被认为是[1, 1, 2], [1, 2, 1], [2, 1, 1]
。事实上,下面的 C++ 代码恰好打印了这三个:
int a[] = {1, 1, 2};
do {
cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));
另一方面,Python 的 itertools.permutations
似乎打印了其他内容:
import itertools
for a in itertools.permutations([1, 1, 2]):
print a
=
(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)
正如用户 Artsiom Rudzenka 在答案中指出的那样,Python文档 是这样说的:
元素根据其位置而不是其值被视为唯一。
我的问题:为什么做出这个设计决定?
似乎遵循通常的约定会给出更有用的结果(实际上这通常正是我想要的)......或者是否有我缺少的Python行为的一些应用?
[或者是一些实施问题?算法如 next_permutation
- 例如在 StackOverflow 这里(由我)解释) 和 此处显示为 O(1) 摊销 — 在 Python 中似乎高效且可实现,但是Python 是否在做一些更高效的事情,因为它不保证基于值的字典顺序?如果是这样,效率的提高是否值得?]
It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list [1, 1, 2]
are usually considered to be[1, 1, 2], [1, 2, 1], [2, 1, 1]
. Indeed, the following C++ code prints precisely those three:
int a[] = {1, 1, 2};
do {
cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));
On the other hand, Python's itertools.permutations
seems to print something else:
import itertools
for a in itertools.permutations([1, 1, 2]):
print a
This prints
(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)
As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so:
Elements are treated as unique based on their position, not on their value.
My question: why was this design decision made?
It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing?
[Or is it some implementation issue? The algorithm as in next_permutation
— for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我不能代表
itertools.permutations
的设计者(Raymond Hettinger)说话,但在我看来,有几点支持该设计:首先,如果您使用
next_permutation
式的方法,那么您将被限制只能传入支持线性排序的对象。而itertools.permutations
提供任何类型对象的排列。想象一下这将是多么烦人:其次,通过不测试对象的相等性,itertools.permutations 避免了在通常情况下调用 __eq__ 方法的成本。必要的。
基本上,itertools.permutations 可以可靠且廉价地解决常见情况。当然有人认为,itertools 应该提供一个避免重复排列的函数,但这样的函数应该是对 itertools.permutations 的补充,而不是代替它。为什么不写一个这样的函数并提交补丁呢?
I can't speak for the designer of
itertools.permutations
(Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:First, if you used a
next_permutation
-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereasitertools.permutations
provides permutations of any kind of object. Imagine how annoying this would be:Second, by not testing for equality on objects,
itertools.permutations
avoids paying the cost of calling the__eq__
method in the usual case where it's not necessary.Basically,
itertools.permutations
solves the common case reliably and cheaply. There's certainly an argument to be made thatitertools
ought to provide a function that avoids duplicate permutations, but such a function should be in addition toitertools.permutations
, not instead of it. Why not write such a function and submit a patch?我接受 Gareth Rees 的答案作为最吸引人的解释(缺少 Python 库设计者的答案),即 Python 的
itertools.permutations
不比较元素的值。想想看,这就是问题所问的问题,但我现在明白了如何将其视为一种优势,具体取决于人们通常使用 itertools.permutations 的用途。为了完整起见,我比较了生成所有不同排列的三种方法。方法 1 的内存和时间效率非常低,但需要的新代码最少,它是包装 Python 的
itertools.permutations
,如 zeekay 的答案所示。方法 2 是 C++next_permutation
的基于生成器的版本,来自 此博文。方法 3 是我写的,它更接近 C++ 的next_permutation
算法< /a>;它就地修改列表(我没有把它做得太笼统)。以下是一些结果。我现在更加尊重 Python 的内置函数:当元素全部(或几乎全部)不同时,它的速度大约是其他方法的三到四倍。当然,当有很多重复元素时,使用它是一个糟糕的主意。
如果有人想探索,代码位于此处。
I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's
itertools.permutations
doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically usesitertools.permutations
for.Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's
itertools.permutations
, as in zeekay's answer. Method 2 is a generator-based version of C++'snext_permutation
, from this blog post. Method 3 is something I wrote that is even closer to C++'snext_permutation
algorithm; it modifies the list in-place (I haven't made it too general).Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.
The code is here if anyone wants to explore.
通过包装 itertools.permutations 可以很容易地获得您喜欢的行为,这可能会影响决策。如文档中所述,itertools 被设计为构建块/工具的集合,用于构建您自己的迭代器。
然而,正如评论中所指出的,这可能没有您想要的那么有效:
也许如果有足够的兴趣,可以添加一个新函数或
itertools.permutations
的可选参数到itertools,更有效地生成没有重复的排列。It's fairly easy to get the behavior you prefer by wrapping
itertools.permutations
, which might have influenced the decision. As described in the documentation,itertools
is designed as a collection of building blocks/tools to use in building your own iterators.However, as pointed out in the comments, this might not be quite as efficient as you'd like:
Perhaps if there is enough interest, a new function or an optional argument to
itertools.permutations
could be added toitertools
, to generate permutations without duplicates more efficiently.让我感到惊讶的是,itertools 没有提供更直观的独特排列概念的函数。对于任何严肃的应用程序来说,生成重复排列只是为了选择其中唯一的排列是不可能的。
我编写了自己的迭代生成器函数,其行为与 itertools.permutations 类似,但不返回重复项。仅考虑原始列表的排列,可以使用标准
itertools
库创建子列表。I find also surprising that
itertools
doesn't have a function for the more intuitive concept of unique permutations. Generating repetitive permutations only to select the unique among them is out of the question for any serious application.I have written my own iterative generator function which behaves similarly to
itertools.permutations
but does not return duplicates. Only permutations of the original list are considered, sublists may be created with the standarditertools
library.重新审视这个老问题,现在最简单的事情就是使用 more_itertools.distinct_permutations。
Revisiting this old question, the easiest thing to do now is to use more_itertools.distinct_permutations.
也许我错了,但似乎原因在于 '元素根据其位置而不是其值被视为唯一。因此,如果输入元素是唯一的,则每个排列中不会有重复值。'
您已经指定了 (1,1,2) ,从您的角度来看,0 索引处的 1 和 1 索引处的 1 是相同的 - 但事实并非如此,因为排列 python 实现使用索引而不是值。
因此,如果我们看一下默认的 python 排列实现,我们将看到它使用索引:
例如,如果将输入更改为 [1,2,3],您将获得正确的排列([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) 因为这些值是唯一的。
Maybe i am wrong but seems that reason for this is in 'Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.'
You have specified (1,1,2) and from your point of view 1 at the 0 index and 1 at the 1 index are the same - but this in not so since permutations python implementation used indexes instead of values.
So if we take a look at the default python permutations implementation we will see that it uses indexes:
For example if you change your input to [1,2,3] you will get correct permutations([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) since the values are unique.