为什么 Python 的 itertools.permutations 包含重复项? (当原始列表有重复时)

发布于 2024-11-18 09:22:43 字数 1435 浏览 1 评论 0原文

人们普遍认为,n 个不同符号的列表有 n!排列。然而,当符号不明确时,数学和其他领域最常见的约定似乎是仅计算不同的排列。因此列表[1, 1, 2]的排列通常被认为是
[1, 1, 2], [1, 2, 1], [2, 1, 1]。事实上,下面的 C++ 代码恰好打印了这三个:

int a[] = {1, 1, 2};
do {
    cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));

另一方面,Python 的 itertools.permutations 似乎打印了其他内容:

import itertools
for a in itertools.permutations([1, 1, 2]):
    print a

=

(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)

正如用户 Artsiom Rudzenka 在答案中指出的那样,Python文档 是这样说的:

元素根据其位置而不是其值被视为唯一。

我的问题:为什么做出这个设计决定?

似乎遵循通常的约定会给出更有用的结果(实际上这通常正是我想要的)......或者是否有我缺少的Python行为的一些应用?

[或者是一些实施问题?算法如 next_permutation - 例如在 StackOverflow 这里(由我)解释)此处显示为 O(1) 摊销 — 在 Python 中似乎高效且可实现,但是Python 是否在做一些更高效的事情,因为它不保证基于值的字典顺序?如果是这样,效率的提高是否值得?]

It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list [1, 1, 2] are usually considered to be
[1, 1, 2], [1, 2, 1], [2, 1, 1]. Indeed, the following C++ code prints precisely those three:

int a[] = {1, 1, 2};
do {
    cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));

On the other hand, Python's itertools.permutations seems to print something else:

import itertools
for a in itertools.permutations([1, 1, 2]):
    print a

This prints

(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)

As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so:

Elements are treated as unique based on their position, not on their value.

My question: why was this design decision made?

It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing?

[Or is it some implementation issue? The algorithm as in next_permutation — for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

喜爱纠缠 2024-11-25 09:22:43

我不能代表 itertools.permutations 的设计者(Raymond Hettinger)说话,但在我看来,有几点支持该设计:

首先,如果您使用 next_permutation 式的方法,那么您将被限制只能传入支持线性排序的对象。而 itertools.permutations 提供任何类型对象的排列。想象一下这将是多么烦人:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

其次,通过不测试对象的相等性,itertools.permutations 避免了在通常情况下调用 __eq__ 方法的成本。必要的。

基本上,itertools.permutations 可以可靠且廉价地解决常见情况。当然有人认为,itertools 应该提供一个避免重复排列的函数,但这样的函数应该是对 itertools.permutations 的补充,而不是代替它。为什么不写一个这样的函数并提交补丁呢?

I can't speak for the designer of itertools.permutations (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:

First, if you used a next_permutation-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas itertools.permutations provides permutations of any kind of object. Imagine how annoying this would be:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

Second, by not testing for equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case where it's not necessary.

Basically, itertools.permutations solves the common case reliably and cheaply. There's certainly an argument to be made that itertools ought to provide a function that avoids duplicate permutations, but such a function should be in addition to itertools.permutations, not instead of it. Why not write such a function and submit a patch?

装迷糊 2024-11-25 09:22:43

我接受 Gareth Rees 的答案作为最吸引人的解释(缺少 Python 库设计者的答案),即 Python 的 itertools.permutations 不比较元素的值。想想看,这就是问题所问的问题,但我现在明白了如何将其视为一种优势,具体取决于人们通常使用 itertools.permutations 的用途。

为了完整起见,我比较了生成所有不同排列的三种方法。方法 1 的内存和时间效率非常低,但需要的新代码最少,它是包装 Python 的 itertools.permutations,如 zeekay 的答案所示。方法 2 是 C++ next_permutation 的基于生成器的版本,来自 此博文。方法 3 是我写的,它更接近 C++ 的 next_permutation 算法< /a>;它就地修改列表(我没有把它做得太笼统)。

def next_permutationS(l):
    n = len(l)
    #Step 1: Find tail
    last = n-1 #tail is from `last` to end
    while last>0:
        if l[last-1] < l[last]: break
        last -= 1
    #Step 2: Increase the number just before tail
    if last>0:
        small = l[last-1]
        big = n-1
        while l[big] <= small: big -= 1
        l[last-1], l[big] = l[big], small
    #Step 3: Reverse tail
    i = last
    j = n-1
    while i < j:
        l[i], l[j] = l[j], l[i]
        i += 1
        j -= 1
    return last>0

以下是一些结果。我现在更加尊重 Python 的内置函数:当元素全部(或几乎全部)不同时,它的速度大约是其他方法的三到四倍。当然,当有很多重复元素时,使用它是一个糟糕的主意。

Some results ("us" means microseconds):

l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s
[1, 1, 2]                               5.98 us       12.3 us       7.54 us
[1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s

[1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms
[1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us
[1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

如果有人想探索,代码位于此处

I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's itertools.permutations doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses itertools.permutations for.

Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's itertools.permutations, as in zeekay's answer. Method 2 is a generator-based version of C++'s next_permutation, from this blog post. Method 3 is something I wrote that is even closer to C++'s next_permutation algorithm; it modifies the list in-place (I haven't made it too general).

def next_permutationS(l):
    n = len(l)
    #Step 1: Find tail
    last = n-1 #tail is from `last` to end
    while last>0:
        if l[last-1] < l[last]: break
        last -= 1
    #Step 2: Increase the number just before tail
    if last>0:
        small = l[last-1]
        big = n-1
        while l[big] <= small: big -= 1
        l[last-1], l[big] = l[big], small
    #Step 3: Reverse tail
    i = last
    j = n-1
    while i < j:
        l[i], l[j] = l[j], l[i]
        i += 1
        j -= 1
    return last>0

Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.

Some results ("us" means microseconds):

l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s
[1, 1, 2]                               5.98 us       12.3 us       7.54 us
[1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s

[1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms
[1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us
[1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

The code is here if anyone wants to explore.

你在我安 2024-11-25 09:22:43

通过包装 itertools.permutations 可以很容易地获得您喜欢的行为,这可能会影响决策。如文档中所述,itertools 被设计为构建块/工具的集合,用于构建您自己的迭代器。

def unique(iterable):
    seen = set()
    for x in iterable:
        if x in seen:
            continue
        seen.add(x)
        yield x

for a in unique(permutations([1, 1, 2])):
    print a

(1, 1, 2)
(1, 2, 1)
(2, 1, 1)

然而,正如评论中所指出的,这可能没有您想要的那么有效:

>>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))
1 loops, best of 3: 4.27 s per loop

>>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])))
1 loops, best of 3: 13.2 s per loop

也许如果有足够的兴趣,可以添加一个新函数或 itertools.permutations 的可选参数到itertools,更有效地生成没有重复的排列。

It's fairly easy to get the behavior you prefer by wrapping itertools.permutations, which might have influenced the decision. As described in the documentation, itertools is designed as a collection of building blocks/tools to use in building your own iterators.

def unique(iterable):
    seen = set()
    for x in iterable:
        if x in seen:
            continue
        seen.add(x)
        yield x

for a in unique(permutations([1, 1, 2])):
    print a

(1, 1, 2)
(1, 2, 1)
(2, 1, 1)

However, as pointed out in the comments, this might not be quite as efficient as you'd like:

>>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))
1 loops, best of 3: 4.27 s per loop

>>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])))
1 loops, best of 3: 13.2 s per loop

Perhaps if there is enough interest, a new function or an optional argument to itertools.permutations could be added to itertools, to generate permutations without duplicates more efficiently.

╰ゝ天使的微笑 2024-11-25 09:22:43

让我感到惊讶的是,itertools 没有提供更直观的独特排列概念的函数。对于任何严肃的应用程序来说,生成重复排列只是为了选择其中唯一的排列是不可能的。

我编写了自己的迭代生成器函数,其行为与 itertools.permutations 类似,但不返回重复项。仅考虑原始列表的排列,可以使用标准 itertools 库创建子列表。

def unique_permutations(t):
    lt = list(t)
    lnt = len(lt)
    if lnt == 1:
        yield lt
    st = set(t)
    for d in st:
        lt.remove(d)
        for perm in unique_permutations(lt):
            yield [d]+perm
        lt.append(d)

I find also surprising that itertools doesn't have a function for the more intuitive concept of unique permutations. Generating repetitive permutations only to select the unique among them is out of the question for any serious application.

I have written my own iterative generator function which behaves similarly to itertools.permutations but does not return duplicates. Only permutations of the original list are considered, sublists may be created with the standard itertools library.

def unique_permutations(t):
    lt = list(t)
    lnt = len(lt)
    if lnt == 1:
        yield lt
    st = set(t)
    for d in st:
        lt.remove(d)
        for perm in unique_permutations(lt):
            yield [d]+perm
        lt.append(d)
黯然#的苍凉 2024-11-25 09:22:43

重新审视这个老问题,现在最简单的事情就是使用 more_itertools.distinct_permutations

Revisiting this old question, the easiest thing to do now is to use more_itertools.distinct_permutations.

仅一夜美梦 2024-11-25 09:22:43

也许我错了,但似乎原因在于 '元素根据其位置而不是其值被视为唯一。因此,如果输入元素是唯一的,则每个排列中不会有重复值。'
您已经指定了 (1,1,2) ,从您的角度来看,0 索引处的 1 和 1 索引处的 1 是相同的 - 但事实并非如此,因为排列 python 实现使用索引而不是值。

因此,如果我们看一下默认的 python 排列实现,我们将看到它使用索引:

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

例如,如果将输入更改为 [1,2,3],您将获得正确的排列([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) 因为这些值是唯一的。

Maybe i am wrong but seems that reason for this is in 'Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.'
You have specified (1,1,2) and from your point of view 1 at the 0 index and 1 at the 1 index are the same - but this in not so since permutations python implementation used indexes instead of values.

So if we take a look at the default python permutations implementation we will see that it uses indexes:

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

For example if you change your input to [1,2,3] you will get correct permutations([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) since the values are unique.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文