lambda 与列表理解性能

发布于 2024-08-08 22:15:58 字数 1666 浏览 4 评论 0原文

我最近发布了一个使用 lambda 函数的问题，在回复中有人提到 lambda 已经不再受欢迎，而应该使用列表推导式。我对 Python 比较陌生。我运行了一个简单的测试：

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N  #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
    for y in T:
        if x==y:
            N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N

它们都打印相同的 N，所以我评论说打印 stmt 输出（除了最后一种无序的方式），但是在重复测试中产生的时间差异很有趣，如这个示例所示：

time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698

所以当我找到列表时整体上理解起来更容易阅读，但至少在这个例子中似乎存在一些性能问题。

那么，有两个问题：

为什么 lambda 等被搁置？
对于列表理解方式，是否有更有效的实现以及如何在不进行测试的情况下知道它更有效？我的意思是，由于额外的函数调用，lambda/map/filter 的效率应该较低，但它似乎更有效。

保罗

原文

I recently posted a question using a lambda function and in a reply someone had mentioned lambda is going out of favor, to use list comprehensions instead. I am relatively new to Python. I ran a simple test:

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N  #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
    for y in T:
        if x==y:
            N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N

They all print the same N so I commented that print stmt out (except the last way it's unordered), but the resulting time differences were interesting over repeated tests as seen in this one example:

time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698

So while I find list comprehensions on the whole easier to read, there seems to be some performance issues at least in this example.

So, two questions:

Why is lambda etc being pushed aside?
For the list comprehension ways, is there a more efficient implementation and how would you KNOW it's more efficient without testing? I mean, lambda/map/filter was supposed to be less efficient because of the extra function calls, but it seems to be MORE efficient.

Paul

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鯉魚旗 2024-08-15 22:15:59

您的测试正在做非常不同的事情。 S 为 1M 个元素，T 为 300：

[x for x in S for y in T if x==y]= 54.875

此选项执行 300M 相等比较。

filter(lambda x:x in S,T)= 0.391000032425

此选项通过 S 进行 300 次线性搜索

。

[val for val in S if val in T]= 12.6089999676

此选项通过 T 执行 1M 次线性搜索

。

list(set(S) & set(T))= 0.125

此选项执行两个集合构造和一个集合交集。

这些选项之间的性能差异更多地与每个选项所使用的算法相关，而不是列表推导式和 lambda 之间的任何差异。

Your tests are doing very different things. With S being 1M elements and T being 300:

[x for x in S for y in T if x==y]= 54.875

This option does 300M equality comparisons.

filter(lambda x:x in S,T)= 0.391000032425

This option does 300 linear searches through S.

[val for val in S if val in T]= 12.6089999676

This option does 1M linear searches through T.

list(set(S) & set(T))= 0.125

This option does two set constructions and one set intersection.

The differences in performance between these options is much more related to the algorithms each one is using, rather than any difference between list comprehensions and lambda.

回复收藏 0 原文

我们只是彼此的过ke 2024-08-15 22:15:59

当我修复您的代码以使列表理解和对 filter 的调用实际上执行相同的工作时，事情会发生很大变化：

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in T if x in S]
time2 = time.time()
print 'time diff [x for x in T if x in S]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N

然后输出更像是：

time diff [x for x in T if x in S]= 0.414485931396
time diff filter(lambda x:x in S,T)= 0.466315984726

所以列表理解的时间通常是非常接近并且通常小于 lambda 表达式。

lambda 表达式被逐步淘汰的原因是许多人认为它们的可读性比列表推导式低得多。我有点不情愿地同意了。

When I fix your code so that the list comprehension and the call to filter are actually doing the same work things change a whole lot:

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in T if x in S]
time2 = time.time()
print 'time diff [x for x in T if x in S]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N

Then the output is more like:

time diff [x for x in T if x in S]= 0.414485931396
time diff filter(lambda x:x in S,T)= 0.466315984726

So the list comprehension has a time that's generally pretty close to and usually less than the lambda expression.

The reason lambda expressions are being phased out is that many people think they are a lot less readable than list comprehensions. I sort of reluctantly agree.

回复收藏 0 原文

二手情话 2024-08-15 22:15:59

问：为什么 lambda 等被搁置？

答：列表推导式和生成器表达式通常被认为是强大功能和可读性的完美结合。纯函数式编程风格，将 map()、reduce() 和 filter() 与函数（通常是 lambda 函数）被认为不太清楚。此外，Python 添加了内置函数，可以很好地处理 reduce() 的所有主要用途。

假设您想要对一个列表求和。这里有两种方法。

lst = range(10)
print reduce(lambda x, y: x + y, lst)

print sum(lst)

让我成为 sum() 的粉丝，而不是 reduce() 的粉丝来解决这个问题。这是另一个类似的问题：

lst = range(10)
print reduce(lambda x, y: bool(x or y), lst)

print any(lst)

any() 解决方案不仅更容易理解，而且速度也更快；它具有短路评估功能，一旦发现任何真实值就会停止评估。 reduce() 必须遍历整个列表。如果列表有一百万个项目长，并且第一个项目评估为真，那么这种性能差异将非常明显。顺便说一句，any()是Python 2.5中添加的；如果你没有它，这里有一个适用于旧版本Python的版本：

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False

假设你想从某个列表中创建一个偶数平方的列表。

lst = range(10)
print map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst))

print [x**2 for x in lst if x % 2 == 0]

现在假设您想要对平方列表求和。

lst = range(10)
print sum(map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst)))

# list comprehension version of the above
print sum([x**2 for x in lst if x % 2 == 0])

# generator expression version; note the lack of '[' and ']'
print sum(x**2 for x in lst if x % 2 == 0)

生成器表达式实际上只返回一个可迭代对象。 sum() 获取可迭代对象并从中逐一提取值，不断求和，直到消耗完所有值。这是在 Python 中解决此问题的最有效方法。相比之下，map() 解决方案以及在 sum() 调用中具有列表理解的等效解决方案必须首先构建一个列表；然后将该列表传递给 sum()，使用一次，然后丢弃。建立列表然后再次删除它的时间只是浪费。（编辑：请注意，同时具有 map 和 filter 的版本必须构建两个列表，其中一个由 filter 构建一个由 map 构建的列表；两个列表都被丢弃。）（编辑：但在 Python 3.0 及更高版本中，map() 和 filter() 现在都是“惰性”的，并且生成一个迭代器而不是一个列表；所以这一点不再像以前那样正确了。此外，在 Python 2.x 中，您可以使用 itertools.imap() 和 itertools.ifilter() 进行基于迭代器的映射和过滤器。但我仍然更喜欢生成器表达式解决方案而不是任何映射/过滤器解决方案。）

通过组合 map()、filter() 和 reduce()< /code> 与 lambda 函数结合，您可以做许多强大的事情。但 Python 有解决相同问题的惯用方法，这些方法同时性能更好，更易于阅读和理解。

Q: Why is lambda etc being pushed aside?

A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use map(), reduce(), and filter() with functions (often lambda functions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses for reduce().

Suppose you wanted to sum a list. Here are two ways of doing it.

lst = range(10)
print reduce(lambda x, y: x + y, lst)

print sum(lst)

Sign me up as a fan of sum() and not a fan of reduce() to solve this problem. Here's another, similar problem:

lst = range(10)
print reduce(lambda x, y: bool(x or y), lst)

print any(lst)

Not only is the any() solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. The reduce() has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way, any() was added in Python 2.5; if you don't have it, here is a version for older versions of Python:

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False

Suppose you wanted to make a list of squares of even numbers from some list.

lst = range(10)
print map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst))

print [x**2 for x in lst if x % 2 == 0]

Now suppose you wanted to sum that list of squares.

lst = range(10)
print sum(map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst)))

# list comprehension version of the above
print sum([x**2 for x in lst if x % 2 == 0])

# generator expression version; note the lack of '[' and ']'
print sum(x**2 for x in lst if x % 2 == 0)

The generator expression actually just returns an iterable object. sum() takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, the map() solution, and the equivalent solution with a list comprehension inside the call to sum(), must first build a list; this list is then passed to sum(), used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with both map and filter must build two lists, one built by filter and one built by map; both lists are discarded.) (EDIT: But in Python 3.0 and newer, map() and filter() are now both "lazy" and produce an iterator instead of a list; so this point is less true than it used to be. Also, in Python 2.x you were able to use itertools.imap() and itertools.ifilter() for iterator-based map and filter. But I continue to prefer the generator expression solutions over any map/filter solutions.)

By composing map(), filter(), and reduce() in combination with lambda functions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.

回复收藏 0 原文

初相遇 2024-08-15 22:15:59

很多人已经指出，你正在比较苹果和橙子等等。但我认为没有人展示如何进行真正简单的比较——列表理解与映射加上 lambda，几乎没有其他阻碍——而且可能是：

$ python -mtimeit -s'L=range(1000)' 'map(lambda x: x+1, L)'
1000 loops, best of 3: 328 usec per loop
$ python -mtimeit -s'L=range(1000)' '[x+1 for x in L]'
10000 loops, best of 3: 129 usec per loop

在这里，您可以非常清楚地看到 lambda 的成本——大约 200 微秒，在像这样的足够简单的操作的情况下，该成本淹没了操作本身。

当然，数字与过滤器非常相似，因为问题不是过滤器或映射，而是 lambda 本身：

$ python -mtimeit -s'L=range(1000)' '[x for x in L if not x%7]'
10000 loops, best of 3: 162 usec per loop
$ python -mtimeit -s'L=range(1000)' 'filter(lambda x: not x%7, L)'
1000 loops, best of 3: 334 usec per loop

毫无疑问，lambda 可能不太清楚，或其与斯巴达（斯巴达人）的奇怪联系他们的盾牌上画了一个 Lambda，代表“Lakedaimon”——这表明 lambda 相当独裁和血腥；-) 其慢慢过时的原因至少与它的性能成本有关。但后者却是相当真实的。

Many people have already pointed out that you're comparing apples with oranges, etc, etc. But I think nobody's shown how to a really simple comparison -- list comprehension vs map plus lambda with little else to get in the way -- and that might be:

$ python -mtimeit -s'L=range(1000)' 'map(lambda x: x+1, L)'
1000 loops, best of 3: 328 usec per loop
$ python -mtimeit -s'L=range(1000)' '[x+1 for x in L]'
10000 loops, best of 3: 129 usec per loop

Here, you can see very sharply the cost of lambda -- about 200 microseconds, which in the case of a sufficiently simple operation such as this one swamps the operation itself.

Numbers are very similar with filter of course, since the problem is not filter or map, but rather the lambda itself:

$ python -mtimeit -s'L=range(1000)' '[x for x in L if not x%7]'
10000 loops, best of 3: 162 usec per loop
$ python -mtimeit -s'L=range(1000)' 'filter(lambda x: not x%7, L)'
1000 loops, best of 3: 334 usec per loop

No doubt the fact that lambda can be less clear, or its weird connection with Sparta (Spartans had a Lambda, for "Lakedaimon", painted on their shields -- this suggests that lambda is pretty dictatorial and bloody;-) have at least as much to do with its slowly falling out of fashion, as its performance costs. But the latter are quite real.

回复收藏 0 原文

记忆消瘦 2024-08-15 22:15:59

首先，像这样进行测试：

import timeit

S=[x for x in range(10000)]
T=[y**2 for y in range(30)]

print "v1", timeit.Timer('[x for x in S for y in T if x==y]',
             'from __main__ import S,T').timeit(100)
print "v2", timeit.Timer('filter(lambda x:x in S,T)',
             'from __main__ import S,T').timeit(100)
print "v3", timeit.Timer('[val for val in T if val in S]',
             'from __main__ import S,T').timeit(100)
print "v4", timeit.Timer('list(set(S) & set(T))',
             'from __main__ import S,T').timeit(100)

基本上每次测试时你都会做不同的事情。例如，当您重写列表理解时，

[val for val in T if val in S]

性能将与“lambda/filter”构造相当。

First of all, test like this:

import timeit

S=[x for x in range(10000)]
T=[y**2 for y in range(30)]

print "v1", timeit.Timer('[x for x in S for y in T if x==y]',
             'from __main__ import S,T').timeit(100)
print "v2", timeit.Timer('filter(lambda x:x in S,T)',
             'from __main__ import S,T').timeit(100)
print "v3", timeit.Timer('[val for val in T if val in S]',
             'from __main__ import S,T').timeit(100)
print "v4", timeit.Timer('list(set(S) & set(T))',
             'from __main__ import S,T').timeit(100)

And basically you are doing different things each time you test. When you would rewrite the list-comprehension for example as

[val for val in T if val in S]

performance would be on par with the 'lambda/filter' construct.

回复收藏 0 原文

淡淡的优雅 2024-08-15 22:15:59

集合是正确的解决方案。不过，尝试交换 S 和 T，看看需要多长时间！

filter(lambda x:x in T,S)

$ python -m timeit -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in S,T)'
10 loops, best of 3: 485 msec per loop
$ python -m timeit -r1 -n1 -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in T,S)'
1 loops, best of 1: 19.6 sec per loop

所以你会发现 S 和 T 的顺序非常重要

更改列表理解的顺序以匹配过滤器给出

$ python -m timeit  -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' '[x for x in T if x in S]'
10 loops, best of 3: 441 msec per loop

因此，如果事实上列表理解比我计算机上的 lambda 稍快

Sets are the correct solution for this. However try swapping S and T and see how long it takes!

filter(lambda x:x in T,S)

$ python -m timeit -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in S,T)'
10 loops, best of 3: 485 msec per loop
$ python -m timeit -r1 -n1 -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in T,S)'
1 loops, best of 1: 19.6 sec per loop

So you see that the order of S and T are quite important

Changing the order of the list comprehension to match the filter gives

$ python -m timeit  -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' '[x for x in T if x in S]'
10 loops, best of 3: 441 msec per loop

So if fact the list comprehension is slightly faster than the lambda on my computer

回复收藏 0 原文

沫雨熙 2024-08-15 22:15:59

您的列表理解和 lambda 正在做不同的事情，与 lambda 匹配的列表理解将是 [val for val in T if val in S]。

效率并不是首选列表理解的原因（尽管它们实际上在几乎所有情况下都稍微快一些）。它们受到青睐的原因是可读性。

尝试使用较小的循环体和较大的循环，例如使 T 成为一个集合，然后迭代 S。在这种情况下，在我的机器上，列表理解速度几乎是原来的两倍。

回复收藏 0 原文

宫墨修音 2024-08-15 22:15:59

您的分析是错误的。查看 timeit 模块并重试。

lambda 定义匿名函数。他们的主要问题是，许多人不了解整个 python 库，并使用它们来重新实现 operator、functools 等模块中已有的函数（还有很多快点）。

列表推导式与 lambda 无关。它们相当于函数式语言中的标准 filter 和 map 函数。 LC 是首选，因为它们也可以用作生成器，更不用说可读性了。

回复收藏 0 原文

半仙 2024-08-15 22:15:59

这非常快：

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

time1 = time.time()
N = [x for x in T if binary_search(S, x) >= 0]
time2 = time.time()
print 'time diff binary search=', time2-time1

简单地说：比较更少，时间更少。

This is pretty fast:

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

time1 = time.time()
N = [x for x in T if binary_search(S, x) >= 0]
time2 = time.time()
print 'time diff binary search=', time2-time1

Simply: less comparisions, less time.

回复收藏 0 原文

沧笙踏歌 2024-08-15 22:15:59

如果您必须处理过滤结果，列表推导式可以产生更大的差异。在你的情况下，你只需构建一个列表，但如果你必须做这样的事情：

n = [f(i) for i in S if some_condition(i)]

你将从 LC 优化中获益：

n = map(f, filter(some_condition(i), S))

仅仅因为后者必须构建一个中间列表（或元组，或字符串，具体取决于S）。因此，您还会注意到每种方法对使用的内存有不同的影响，LC 将保持较低。

lambda 本身并不重要。

List comprehensions can make a bigger difference if you have to process your filtered results. In your case you just build a list, but if you had to do something like this:

n = [f(i) for i in S if some_condition(i)]

you would gain from LC optimization over this:

n = map(f, filter(some_condition(i), S))

simply because the latter has to build an intermediate list (or tuple, or string, depending on the nature of S). As a consequence you will also notice a different impact on the memory used by each method, the LC will keep lower.

The lambda in itself does not matter.

回复收藏 0 原文

~没有更多了~