识别列表长度相似性的最佳方法

发布于 2024-12-10 13:54:54 字数 427 浏览 1 评论 0原文

我有一个字典,其键下包含列表:

dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}

识别列表长度是否相同的最佳方法是什么?

这是我的解决方案:

import itertools
len(set(itertools.imap(len, dct.viewvalues()))) == 1

True 如果相似,False 如果不是

UPD:参考@RaymondHettinger 建议将 map 替换为 itertools.imap

I have a dict containing lists under its keys:

dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}

What is the best way to recognize whether the length of the lists are the same or not?

This is my solution:

import itertools
len(set(itertools.imap(len, dct.viewvalues()))) == 1

True if similar and False if not

UPD: In reference to @RaymondHettinger advice replace map to itertools.imap

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不爱素颜 2024-12-17 13:54:54

你的解决方案看起来不错。

如果您想稍微调整一下,请使用 itertools.imap() 而不是 map()。这会将内存占用缩减为 O(1),而不是 O(n)。

Your solution looks fine.

If you want to tweek it a bit, use itertools.imap() instead of map(). That will collapse the memory footprint to O(1) instead of O(n).

一曲爱恨情仇 2024-12-17 13:54:54

首先,我会坚持使用 itervalues,它使用简单的评估。

其次,我会对依赖使用 set 持谨慎态度,因为它在遍历字典的每次迭代中都会查找集合中的值。超额的时间为 O(1)(在最坏的情况下为 O(n),在我们的例子中,如果所有的都为 O(1)长度相同,如果所有长度不同,则为 O(n))根据 文档。但很难评估使用 set 的开销。

在这种情况下我会使用all。当找到第一个 False 值时,all 失败。因此,长度的第一次不匹配将停止交互过程。而如果使用set,它会遍历所有列表直到最后,然后才将其长度与1进行比较。

>>> dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}
>>> lenght_1 = len(dct.itervalues().next())
>>> all(len(value)==lenght_1 for value in dct.itervalues())
False

>>> dct = {'a': [1, 2, 3],
       'b': [1, 2, 4],
       'c': [1, 2, 5]}
>>> lenght_1 = len(dct.itervalues().next())
>>> all(len(value)==lenght_1 for value in dct.itervalues())
True

可以通过使用相同的迭代器 it 来优化代码,该迭代器不会两次遍历第一个值:

>>> it = dct.itervalues()
>>> length_1 = len(next(it))
>>> all(len(value)==l1 for value in it)
True

First, I would stick with itervalues, which uses easy evaluation.

Second, I would be wary of relying on using set since it performs looking up the value in the set on every iteration of going through dictionary. It's O(1) on the overage (and O(n) in the worse case which is O(1) in our case if all the length are the same, and O(n) if all the length are different) according to the docs. But it's difficult to asses the overhead of using set.

I would use all in this case. all fails when it finds the first False value. So, the first mismatch of the length would stop the interating process. While, if using set, it would go through all the list to the end and only then compare its length to 1.

>>> dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}
>>> lenght_1 = len(dct.itervalues().next())
>>> all(len(value)==lenght_1 for value in dct.itervalues())
False

>>> dct = {'a': [1, 2, 3],
       'b': [1, 2, 4],
       'c': [1, 2, 5]}
>>> lenght_1 = len(dct.itervalues().next())
>>> all(len(value)==lenght_1 for value in dct.itervalues())
True

The code can be optimized by using the same iterator it which will not go through the first value twice:

>>> it = dct.itervalues()
>>> length_1 = len(next(it))
>>> all(len(value)==l1 for value in it)
True
━╋う一瞬間旳綻放 2024-12-17 13:54:54

注意:ovgolovin 的解决方案是好多了。我将这个答案留在这里,因为有讨论提到它。

您的解决方案很好,但您可以使用使用更少内存且更具可读性的生成器表达式:

len(set(len(x) for x in dct.viewvalues()))) == 1

Note: ovgolovin's solution is much better. I'm leaving this answer here because there's discussion that refers to it.

Your solution is fine, but you could use a generator expression which uses less memory and is more readable:

len(set(len(x) for x in dct.viewvalues()))) == 1
老子叫无熙 2024-12-17 13:54:54

正如 Michael J. Barber 在对 答案的评论中所建议的那样,这是使用 groupbyimap 来自 itertools 模块。

imap 只是将 len 函数应用于每个列表。

groupby 只是摸索相同长度的块中的值。

因此,如果有多个长度块,则长度是不同的。如果只有一个长度,则意味着列表的长度相同,第二次访问 groupby 迭代器应产生 StopIteration ,从而返回 Nonenext 函数的默认值)。

这段代码的最大好处是 imapgroupby 是用 C 编写的,而且速度相当快。

from itertools import imap,groupby

dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}

dct2 = {'a': [1, 2, 3],
       'b': [1, 2, 34],
       'c': [1, 2, 5]}

def check_lenghts(iterable):
    it = groupby(imap(len,iterable.itervalues()))
    next(it,None)
    return True if next(it,None)==None else False

print(check_lenghts(dct))
print(check_lenghts(dct2))

As Michael J. Barber suggested in the comments to the answer, here's the code which uses groupby and imap from the itertools module.

imap just applies the len function to every list.

groupby just gropes the values in chunks of the same lengths.

So, if there is more than one chunk of length, the lengths are different. If there is only one chuck of lengths, it means, the lengths of the lists are the same, and the second access to the groupby iterator should yield StopIteration thus returning None (the default value of the next function).

The great benefit of this code is that imap and groupby are written in C and they are pretty fast.

from itertools import imap,groupby

dct = {'a': [1, 2, 3],
       'b': [1, 2, 3, 4],
       'c': [1, 2]}

dct2 = {'a': [1, 2, 3],
       'b': [1, 2, 34],
       'c': [1, 2, 5]}

def check_lenghts(iterable):
    it = groupby(imap(len,iterable.itervalues()))
    next(it,None)
    return True if next(it,None)==None else False

print(check_lenghts(dct))
print(check_lenghts(dct2))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文