当前位置：文江博客话题详情

检查列表中的所有元素是否唯一

发布于 2024-10-21 12:47:37 字数 305 浏览 6 评论 0 原文

检查列表中所有元素是否唯一的最佳方法（最好是传统方法）是什么？

我当前使用 Counter 的方法是：

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

我可以做得更好吗？

原文

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

oО清风挽发oО 2024-10-28 12:47:37

不是最有效的，但直接和简洁：

if len(x) > len(set(x)):
   pass # do something

对于短列表来说可能不会产生太大的影响。

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

回复收藏 0 原文

不知所踪 2024-10-28 12:47:37

这是一个也将提前退出的两行代码：

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

如果 x 的元素不可散列，那么您将不得不使用 seen 列表：

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

回复收藏 0 原文

只是在用心讲痛 2024-10-28 12:47:37

然而，提前退出解决方案可能适用

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

于小情况，或者如果提前退出不是常见情况，那么我预计 len(x) != len(set(x)) 是最快的方法。

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

回复收藏 0 原文

缱倦旧时光 2024-10-28 12:47:37

对于速度：

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

回复收藏 0 原文

云之铃。 2024-10-28 12:47:37

将所有条目添加到一个集合中并检查其长度怎么样？

len(set(x)) == len(x)

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)

回复收藏 0 原文

后eg是否自 2024-10-28 12:47:37

我将建议的解决方案与 perfplot 进行了比较，发现这

len(lst) == len(set(lst))

确实是最快的解决方案。如果列表中存在早期重复项，则应首选一些恒定时间解决方案。

重现该情节的代码：

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

回复收藏 0 原文

谎言月老 2024-10-28 12:47:37

除了set，您还可以使用dict。

len({}.fromkeys(x)) == len(x)

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)

回复收藏 0 原文

你对谁都笑 2024-10-28 12:47:37

完全是另一种方法，使用排序和分组：

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

它需要排序，但在第一个重复值处退出。

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

回复收藏 0 原文

草莓酥 2024-10-28 12:47:37

这是一个递归提前退出函数：

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)

它对我来说足够快，而无需使用奇怪（慢）的转换
采用函数式方法。

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)

It's fast enough for me without using weird(slow) conversions while
having a functional-style approach.

回复收藏 0 原文

音盲 2024-10-28 12:47:37

这是一个有趣的递归 O(N²) 版本：

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Here is a recursive O(N²) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

回复收藏 0 原文

江城子 2024-10-28 12:47:37

上面的所有答案都很好，但我更喜欢使用 all_unique 示例rel="nofollow noreferrer">30秒的python

你需要在给定列表上使用set()来删除重复项，将其长度与长度进行比较列表中的。

def all_unique(lst):
  return len(lst) == len(set(lst))

如果平面列表中的所有值都是唯一，则返回True，否则返回False。

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique, False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

回复收藏 0 原文

乖乖兔^ω^ 2024-10-28 12:47:37

这个怎么样

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

回复收藏 0 原文

缘字诀 2024-10-28 12:47:37

当且仅当您的依赖项中有数据处理库 pandas 时，就有一个已经实现的解决方案可以提供您想要的布尔值：

import pandas as pd
pd.Series(lst).is_unique

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique

回复收藏 0 原文

那小子欠揍 2024-10-28 12:47:37

您可以使用 Yan 的语法 (len(x) > len(set(x)))，但不是 set(x)，而是定义一个函数：

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

并执行 len(x) > len(x) > len(set(x))。 len(f5(x)).这会很快并且也能保持顺序。

那里的代码取自： http://www.peterbe.com/plog/uniqifiers-benchmark

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

回复收藏 0 原文

滥情稳全场 2024-10-28 12:47:37

在 Pandas 数据框中使用类似的方法来测试列的内容是否包含唯一值：

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

对我来说，这对于包含超过一百万行的日期框中的 int 变量来说是瞬时的。

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

回复收藏 0 原文

眼泪也成诗 2024-10-28 12:47:37

它并不完全适合这个问题，但如果你用谷歌搜索我给你的任务，你会得到这个问题排名第一，并且用户可能会感兴趣，因为它是问题的延伸。如果您想调查每个列表元素是否唯一，您可以执行以下操作：

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

对于简短列表，某些答案中建议的 get_unique_using_count 速度很快。但是，如果您的列表已经超过 100 个元素，则 count 函数将花费相当长的时间。因此，get_unique 函数中显示的方法虽然看起来更复杂，但速度要快得多。

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

回复收藏 0 原文

枕花眠 2024-10-28 12:47:37

如果列表无论如何都已排序，您可以使用：

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

非常有效，但不值得为此目的进行排序。

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

回复收藏 0 原文

绅士风度i 2024-10-28 12:47:37

有时，您不仅需要检查所有项目是否唯一，还需要获取第一个不唯一的项目。更重要的是 - 您可能需要获取每个项目的指示，无论它是否是唯一的。

在这种情况下，以下函数可能会有所帮助（它提供了 2 种模式 - 请参阅函数上方的注释）：

# if extMode is False (default) -> returns index of 1st item from in_data, which is not unique (i.e. repeats some previous item), or -1 if no duplicates found
# if extMode is True -> returns dict with key = index of item from in_data, value = True if the item is unique (otherwise False)
# Note: in_data should be iterable
def checkDuplicates(in_data, extMode = False):
    if None == in_data:
        return {} if extMode else -1 # depending on your needs here could also return None instead of {}
    r = {}
    s = set()
    c = -1
    for i in in_data:
        c += 1
        if i in s: # duplicate found
            if not extMode:
                return c
            r[c] = False
        else: # not duplicating item
            if extMode:
                r[c] = True
            s.add(i)
    if extMode:
        return r
    return -1

Sometimes you not just need to check whether all items are unique, but also get the first not unique item. Even more - you might need to get the indication for each item whether it's unique or not.

In this case the following function might help (it provides 2 modes - see the comments above the function):

# if extMode is False (default) -> returns index of 1st item from in_data, which is not unique (i.e. repeats some previous item), or -1 if no duplicates found
# if extMode is True -> returns dict with key = index of item from in_data, value = True if the item is unique (otherwise False)
# Note: in_data should be iterable
def checkDuplicates(in_data, extMode = False):
    if None == in_data:
        return {} if extMode else -1 # depending on your needs here could also return None instead of {}
    r = {}
    s = set()
    c = -1
    for i in in_data:
        c += 1
        if i in s: # duplicate found
            if not extMode:
                return c
            r[c] = False
        else: # not duplicating item
            if extMode:
                r[c] = True
            s.add(i)
    if extMode:
        return r
    return -1

回复收藏 0 原文

那支青花 2024-10-28 12:47:37

对于初学者：

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

回复收藏 0 原文

~没有更多了~