确定 2 个列表是否具有相同的元素,无论顺序如何?

发布于 2024-12-26 16:17:55 字数 176 浏览 1 评论 0原文

抱歉这个简单的问题,但我很难找到答案。

当我比较两个列表时,我想知道它们是否“相等”,因为它们具有相同的内容,但顺序不同。

例如:

x = ['a', 'b']
y = ['b', 'a']

我希望 x == y 的计算结果为 True。

Sorry for the simple question, but I'm having a hard time finding the answer.

When I compare 2 lists, I want to know if they are "equal" in that they have the same contents, but in different order.

Ex:

x = ['a', 'b']
y = ['b', 'a']

I want x == y to evaluate to True.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

终难愈 2025-01-02 16:17:56

您可以简单地检查包含 x 和 y 元素的多重集是否相等:

import collections
collections.Counter(x) == collections.Counter(y)

这要求元素是可哈希的;运行时间将为O(n),其中n是列表的大小。

如果元素也是唯一的,您还可以转换为集合(相同的渐近运行时,在实践中可能会快一点):

set(x) == set(y)

如果元素不可散列,但可排序,则另一种选择(运行时为 O(n log n))

sorted(x) == sorted(y)

如果元素既不可散列也不可排序,您可以使用以下辅助函数。请注意,它会非常慢(O(n²)),并且通常应该在不可散列和不可排序元素的深奥情况之外使用。

def equal_ignore_order(a, b):
    """ Use only when elements are neither hashable nor sortable! """
    unmatched = list(b)
    for element in a:
        try:
            unmatched.remove(element)
        except ValueError:
            return False
    return not unmatched

You can simply check whether the multisets with the elements of x and y are equal:

import collections
collections.Counter(x) == collections.Counter(y)

This requires the elements to be hashable; runtime will be in O(n), where n is the size of the lists.

If the elements are also unique, you can also convert to sets (same asymptotic runtime, may be a little bit faster in practice):

set(x) == set(y)

If the elements are not hashable, but sortable, another alternative (runtime in O(n log n)) is

sorted(x) == sorted(y)

If the elements are neither hashable nor sortable you can use the following helper function. Note that it will be quite slow (O(n²)) and should generally not be used outside of the esoteric case of unhashable and unsortable elements.

def equal_ignore_order(a, b):
    """ Use only when elements are neither hashable nor sortable! """
    unmatched = list(b)
    for element in a:
        try:
            unmatched.remove(element)
        except ValueError:
            return False
    return not unmatched
岁吢 2025-01-02 16:17:56

确定 2 个列表是否具有相同的元素,无论顺序如何?

从您的示例中推断:

x = ['a', 'b']
y = ['b', 'a']

列表的元素不会重复(它们是唯一的)以及可散列(哪些字符串和其他元素)某些不可变的 Python 对象是),最直接且计算效率最高的答案使用 Python 的内置集合(语义上类似于您可能在学校学到的数学集合)。

set(x) == set(y) # prefer this if elements are hashable

在元素可散列但不唯一的情况下,collections.Counter 在语义上也可以作为多重集工作,但它要慢得多

from collections import Counter
Counter(x) == Counter(y)

更喜欢使用 sorted

sorted(x) == sorted(y) 

如果元素是可排序的。这将考虑到非唯一或不可散列的情况,但这可能比使用集合慢得多。

实证实验

一项实证实验得出的结论是,人们应该更喜欢set,然后sorted。仅当您需要其他内容(例如计数或作为多重集进一步使用)时,才选择 Counter

第一个设置:

import timeit
import random
from collections import Counter

data = [str(random.randint(0, 100000)) for i in xrange(100)]
data2 = data[:]     # copy the list into a new one

def sets_equal(): 
    return set(data) == set(data2)

def counters_equal(): 
    return Counter(data) == Counter(data2)

def sorted_lists_equal(): 
    return sorted(data) == sorted(data2)

和测试:

>>> min(timeit.repeat(sets_equal))
13.976069927215576
>>> min(timeit.repeat(counters_equal))
73.17287588119507
>>> min(timeit.repeat(sorted_lists_equal))
36.177085876464844

所以我们看到比较集合是最快的解决方案,比较排序列表是第二快的。

Determine if 2 lists have the same elements, regardless of order?

Inferring from your example:

x = ['a', 'b']
y = ['b', 'a']

that the elements of the lists won't be repeated (they are unique) as well as hashable (which strings and other certain immutable python objects are), the most direct and computationally efficient answer uses Python's builtin sets, (which are semantically like mathematical sets you may have learned about in school).

set(x) == set(y) # prefer this if elements are hashable

In the case that the elements are hashable, but non-unique, the collections.Counter also works semantically as a multiset, but it is far slower:

from collections import Counter
Counter(x) == Counter(y)

Prefer to use sorted:

sorted(x) == sorted(y) 

if the elements are orderable. This would account for non-unique or non-hashable circumstances, but this could be much slower than using sets.

Empirical Experiment

An empirical experiment concludes that one should prefer set, then sorted. Only opt for Counter if you need other things like counts or further usage as a multiset.

First setup:

import timeit
import random
from collections import Counter

data = [str(random.randint(0, 100000)) for i in xrange(100)]
data2 = data[:]     # copy the list into a new one

def sets_equal(): 
    return set(data) == set(data2)

def counters_equal(): 
    return Counter(data) == Counter(data2)

def sorted_lists_equal(): 
    return sorted(data) == sorted(data2)

And testing:

>>> min(timeit.repeat(sets_equal))
13.976069927215576
>>> min(timeit.repeat(counters_equal))
73.17287588119507
>>> min(timeit.repeat(sorted_lists_equal))
36.177085876464844

So we see that comparing sets is the fastest solution, and comparing sorted lists is second fastest.

煞人兵器 2025-01-02 16:17:56

这似乎有效,尽管对于大型列表来说可能很麻烦。

>>> A = [0, 1]
>>> B = [1, 0]
>>> C = [0, 2]
>>> not sum([not i in A for i in B])
True
>>> not sum([not i in A for i in C])
False
>>> 

但是,如果每个列表必须包含其他列表的所有元素,那么上面的代码就会有问题。

>>> A = [0, 1, 2]
>>> not sum([not i in A for i in B])
True

当 len(A) != len(B) 时就会出现问题,在本例中,len(A) > len(A) != len(B) 时就会出现问题。长度(B)。为了避免这种情况,您可以再添加一条语句。

>>> not sum([not i in A for i in B]) if len(A) == len(B) else False
False

另一件事是,我在 Aaron Hall 在他的帖子中使用的相同条件下使用 timeit.repeat 对我的解决方案进行了基准测试。正如所怀疑的那样,结果令人失望。我的方法是最后一种。 set(x) == set(y) 是的。

>>> def foocomprehend(): return not sum([not i in data for i in data2])
>>> min(timeit.repeat('fooset()', 'from __main__ import fooset, foocount, foocomprehend'))
25.2893661496
>>> min(timeit.repeat('foosort()', 'from __main__ import fooset, foocount, foocomprehend'))
94.3974742993
>>> min(timeit.repeat('foocomprehend()', 'from __main__ import fooset, foocount, foocomprehend'))
187.224562545

This seems to work, though possibly cumbersome for large lists.

>>> A = [0, 1]
>>> B = [1, 0]
>>> C = [0, 2]
>>> not sum([not i in A for i in B])
True
>>> not sum([not i in A for i in C])
False
>>> 

However, if each list must contain all the elements of other then the above code is problematic.

>>> A = [0, 1, 2]
>>> not sum([not i in A for i in B])
True

The problem arises when len(A) != len(B) and, in this example, len(A) > len(B). To avoid this, you can add one more statement.

>>> not sum([not i in A for i in B]) if len(A) == len(B) else False
False

One more thing, I benchmarked my solution with timeit.repeat, under the same conditions used by Aaron Hall in his post. As suspected, the results are disappointing. My method is the last one. set(x) == set(y) it is.

>>> def foocomprehend(): return not sum([not i in data for i in data2])
>>> min(timeit.repeat('fooset()', 'from __main__ import fooset, foocount, foocomprehend'))
25.2893661496
>>> min(timeit.repeat('foosort()', 'from __main__ import fooset, foocount, foocomprehend'))
94.3974742993
>>> min(timeit.repeat('foocomprehend()', 'from __main__ import fooset, foocount, foocomprehend'))
187.224562545
没有伤那来痛 2025-01-02 16:17:56

正如上面评论中提到的,一般情况是很痛苦的。如果所有项目都是可散列的或所有项目都是可排序的,则相当容易。然而我最近不得不尝试解决一般情况。这是我的解决方案。我在发布后意识到这是与上面我在第一次通过时错过的解决方案的重复。无论如何,如果您使用切片而不是 list.remove() ,您可以比较不可变序列。

def sequences_contain_same_items(a, b):
    for item in a:
        try:
            i = b.index(item)
        except ValueError:
            return False
        b = b[:i] + b[i+1:]
    return not b

As mentioned in comments above, the general case is a pain. It is fairly easy if all items are hashable or all items are sortable. However I have recently had to try solve the general case. Here is my solution. I realised after posting that this is a duplicate to a solution above that I missed on the first pass. Anyway, if you use slices rather than list.remove() you can compare immutable sequences.

def sequences_contain_same_items(a, b):
    for item in a:
        try:
            i = b.index(item)
        except ValueError:
            return False
        b = b[:i] + b[i+1:]
    return not b
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文