识别列表长度相似性的最佳方法
我有一个字典,其键下包含列表:
dct = {'a': [1, 2, 3],
'b': [1, 2, 3, 4],
'c': [1, 2]}
识别列表长度是否相同的最佳方法是什么?
这是我的解决方案:
import itertools
len(set(itertools.imap(len, dct.viewvalues()))) == 1
True
如果相似,False
如果不是
UPD:参考@RaymondHettinger 建议将 map
替换为 itertools.imap
I have a dict containing lists under its keys:
dct = {'a': [1, 2, 3],
'b': [1, 2, 3, 4],
'c': [1, 2]}
What is the best way to recognize whether the length of the lists are the same or not?
This is my solution:
import itertools
len(set(itertools.imap(len, dct.viewvalues()))) == 1
True
if similar and False
if not
UPD: In reference to @RaymondHettinger advice replace map
to itertools.imap
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你的解决方案看起来不错。
如果您想稍微调整一下,请使用 itertools.imap() 而不是 map()。这会将内存占用缩减为 O(1),而不是 O(n)。
Your solution looks fine.
If you want to tweek it a bit, use itertools.imap() instead of map(). That will collapse the memory footprint to O(1) instead of O(n).
首先,我会坚持使用 itervalues,它使用简单的评估。
其次,我会对依赖使用
set
持谨慎态度,因为它在遍历字典的每次迭代中都会查找集合中的值。超额的时间为O(1)
(在最坏的情况下为O(n)
,在我们的例子中,如果所有的都为O(1)
长度相同,如果所有长度不同,则为O(n)
)根据 文档。但很难评估使用 set 的开销。在这种情况下我会使用
all
。当找到第一个False
值时,all
失败。因此,长度的第一次不匹配将停止交互过程。而如果使用set
,它会遍历所有列表直到最后,然后才将其长度与1
进行比较。可以通过使用相同的迭代器
it
来优化代码,该迭代器不会两次遍历第一个值:First, I would stick with
itervalues
, which uses easy evaluation.Second, I would be wary of relying on using
set
since it performs looking up the value in the set on every iteration of going through dictionary. It'sO(1)
on the overage (andO(n)
in the worse case which isO(1)
in our case if all the length are the same, andO(n)
if all the length are different) according to the docs. But it's difficult to asses the overhead of using set.I would use
all
in this case.all
fails when it finds the firstFalse
value. So, the first mismatch of the length would stop the interating process. While, if usingset
, it would go through all the list to the end and only then compare its length to1
.The code can be optimized by using the same iterator
it
which will not go through the first value twice:注意:ovgolovin 的解决方案是好多了。我将这个答案留在这里,因为有讨论提到它。
您的解决方案很好,但您可以使用使用更少内存且更具可读性的生成器表达式:
Note: ovgolovin's solution is much better. I'm leaving this answer here because there's discussion that refers to it.
Your solution is fine, but you could use a generator expression which uses less memory and is more readable:
正如 Michael J. Barber 在对 答案的评论中所建议的那样,这是使用
groupby
和imap
来自 itertools 模块。imap
只是将len
函数应用于每个列表。groupby
只是摸索相同长度的块中的值。因此,如果有多个长度块,则长度是不同的。如果只有一个长度,则意味着列表的长度相同,第二次访问
groupby
迭代器应产生StopIteration
,从而返回None
(next
函数的默认值)。这段代码的最大好处是
imap
和groupby
是用 C 编写的,而且速度相当快。As Michael J. Barber suggested in the comments to the answer, here's the code which uses
groupby
andimap
from the itertools module.imap
just applies thelen
function to every list.groupby
just gropes the values in chunks of the same lengths.So, if there is more than one chunk of length, the lengths are different. If there is only one chuck of lengths, it means, the lengths of the lists are the same, and the second access to the
groupby
iterator should yieldStopIteration
thus returningNone
(the default value of thenext
function).The great benefit of this code is that
imap
andgroupby
are written in C and they are pretty fast.