检查列表中的所有元素是否唯一
检查列表中所有元素是否唯一的最佳方法(最好是传统方法)是什么?
我当前使用 Counter
的方法是:
>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
if values > 1:
# do something
我可以做得更好吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
不是最有效的,但直接和简洁:
对于短列表来说可能不会产生太大的影响。
Not the most efficient, but straight forward and concise:
Probably won't make much of a difference for short lists.
这是一个也将提前退出的两行代码:
如果 x 的元素不可散列,那么您将不得不使用
seen
列表:Here is a two-liner that will also do early exit:
If the elements of x aren't hashable, then you'll have to resort to using a list for
seen
:然而,提前退出解决方案可能适用
于小情况,或者如果提前退出不是常见情况,那么我预计 len(x) != len(set(x)) 是最快的方法。
An early-exit solution could be
however for small cases or if early-exiting is not the common case then I would expect
len(x) != len(set(x))
being the fastest method.对于速度:
for speed:
将所有条目添加到一个集合中并检查其长度怎么样?
How about adding all the entries to a set and checking its length?
我将建议的解决方案与 perfplot 进行了比较,发现这
确实是最快的解决方案。如果列表中存在早期重复项,则应首选一些恒定时间解决方案。
重现该情节的代码:
I've compared the suggested solutions with perfplot and found that
is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.
Code to reproduce the plot:
除了
set
,您还可以使用dict
。Alternative to a
set
, you can use adict
.完全是另一种方法,使用排序和分组:
它需要排序,但在第一个重复值处退出。
Another approach entirely, using sorted and groupby:
It requires a sort, but exits on the first repeated value.
这是一个递归提前退出函数:
它对我来说足够快,而无需使用奇怪(慢)的转换
采用函数式方法。
Here is a recursive early-exit function:
It's fast enough for me without using weird(slow) conversions while
having a functional-style approach.
这是一个有趣的递归 O(N2) 版本:
Here is a recursive O(N2) version for fun:
上面的所有答案都很好,但我更喜欢使用
all_unique
示例rel="nofollow noreferrer">30秒的python你需要在给定列表上使用
set()
来删除重复项,将其长度与长度进行比较列表中的。如果平面列表中的所有值都是
唯一
,则返回True
,否则返回False
。All answer above are good but I prefer to use
all_unique
example from 30 seconds of pythonYou need to use
set()
on the given list to remove duplicates, compare its length with the length of the list.It returns
True
if all the values in a flat list areunique
,False
otherwise.这个怎么样
How about this
当且仅当您的依赖项中有数据处理库 pandas 时,就有一个已经实现的解决方案可以提供您想要的布尔值:
If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :
您可以使用 Yan 的语法 (len(x) > len(set(x))),但不是 set(x),而是定义一个函数:
并执行 len(x) > len(x) > len(set(x))。 len(f5(x)).这会很快并且也能保持顺序。
那里的代码取自: http://www.peterbe.com/plog/uniqifiers-benchmark
You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:
and do len(x) > len(f5(x)). This will be fast and is also order preserving.
Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark
在 Pandas 数据框中使用类似的方法来测试列的内容是否包含唯一值:
对我来说,这对于包含超过一百万行的日期框中的 int 变量来说是瞬时的。
Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:
For me, this is instantaneous on an int variable in a dateframe containing over a million rows.
它并不完全适合这个问题,但如果你用谷歌搜索我给你的任务,你会得到这个问题排名第一,并且用户可能会感兴趣,因为它是问题的延伸。如果您想调查每个列表元素是否唯一,您可以执行以下操作:
对于简短列表,某些答案中建议的
get_unique_using_count
速度很快。但是,如果您的列表已经超过 100 个元素,则 count 函数将花费相当长的时间。因此,get_unique
函数中显示的方法虽然看起来更复杂,但速度要快得多。It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:
for short lists the
get_unique_using_count
as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in theget_unique
function is much faster although it looks more complicated.如果列表无论如何都已排序,您可以使用:
非常有效,但不值得为此目的进行排序。
If the list is sorted anyway, you can use:
Pretty efficient, but not worth sorting for this purpose though.
有时,您不仅需要检查所有项目是否唯一,还需要获取第一个不唯一的项目。更重要的是 - 您可能需要获取每个项目的指示,无论它是否是唯一的。
在这种情况下,以下函数可能会有所帮助(它提供了 2 种模式 - 请参阅函数上方的注释):
Sometimes you not just need to check whether all items are unique, but also get the first not unique item. Even more - you might need to get the indication for each item whether it's unique or not.
In this case the following function might help (it provides 2 modes - see the comments above the function):
对于初学者:
For begginers: