lambda 与列表理解性能
我最近发布了一个使用 lambda 函数的问题,在回复中有人提到 lambda 已经不再受欢迎,而应该使用列表推导式。我对 Python 比较陌生。我运行了一个简单的测试:
import time
S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
for y in T:
if x==y:
N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N
它们都打印相同的 N,所以我评论说打印 stmt 输出(除了最后一种无序的方式),但是在重复测试中产生的时间差异很有趣,如这个示例所示:
time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698
所以当我找到列表时整体上理解起来更容易阅读,但至少在这个例子中似乎存在一些性能问题。
那么,有两个问题:
为什么 lambda 等被搁置?
对于列表理解方式,是否有更有效的实现以及如何在不进行测试的情况下知道它更有效?我的意思是,由于额外的函数调用,lambda/map/filter 的效率应该较低,但它似乎更有效。
保罗
I recently posted a question using a lambda function and in a reply someone had mentioned lambda is going out of favor, to use list comprehensions instead. I am relatively new to Python. I ran a simple test:
import time
S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
for y in T:
if x==y:
N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N
They all print the same N so I commented that print stmt out (except the last way it's unordered), but the resulting time differences were interesting over repeated tests as seen in this one example:
time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698
So while I find list comprehensions on the whole easier to read, there seems to be some performance issues at least in this example.
So, two questions:
Why is lambda etc being pushed aside?
For the list comprehension ways, is there a more efficient implementation and how would you KNOW it's more efficient without testing? I mean, lambda/map/filter was supposed to be less efficient because of the extra function calls, but it seems to be MORE efficient.
Paul
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
您的测试正在做非常不同的事情。 S 为 1M 个元素,T 为 300:
此选项执行 300M 相等比较。
此选项通过 S 进行 300 次线性搜索
。
此选项通过 T 执行 1M 次线性搜索
。
此选项执行两个集合构造和一个集合交集。
这些选项之间的性能差异更多地与每个选项所使用的算法相关,而不是列表推导式和 lambda 之间的任何差异。
Your tests are doing very different things. With S being 1M elements and T being 300:
This option does 300M equality comparisons.
This option does 300 linear searches through S.
This option does 1M linear searches through T.
This option does two set constructions and one set intersection.
The differences in performance between these options is much more related to the algorithms each one is using, rather than any difference between list comprehensions and
lambda
.当我修复您的代码以使列表理解和对
filter
的调用实际上执行相同的工作时,事情会发生很大变化:然后输出更像是:
所以列表理解的时间通常是非常接近并且通常小于 lambda 表达式。
lambda 表达式被逐步淘汰的原因是许多人认为它们的可读性比列表推导式低得多。我有点不情愿地同意了。
When I fix your code so that the list comprehension and the call to
filter
are actually doing the same work things change a whole lot:Then the output is more like:
So the list comprehension has a time that's generally pretty close to and usually less than the lambda expression.
The reason lambda expressions are being phased out is that many people think they are a lot less readable than list comprehensions. I sort of reluctantly agree.
问:为什么 lambda 等被搁置?
答:列表推导式和生成器表达式通常被认为是强大功能和可读性的完美结合。纯函数式编程风格,将
map()
、reduce()
和filter()
与函数(通常是 lambda 函数)被认为不太清楚。此外,Python 添加了内置函数,可以很好地处理reduce()
的所有主要用途。假设您想要对一个列表求和。这里有两种方法。
让我成为
sum()
的粉丝,而不是reduce()
的粉丝来解决这个问题。这是另一个类似的问题:any()
解决方案不仅更容易理解,而且速度也更快;它具有短路评估功能,一旦发现任何真实值就会停止评估。reduce()
必须遍历整个列表。如果列表有一百万个项目长,并且第一个项目评估为真,那么这种性能差异将非常明显。顺便说一句,any()
是Python 2.5中添加的;如果你没有它,这里有一个适用于旧版本Python的版本:假设你想从某个列表中创建一个偶数平方的列表。
现在假设您想要对平方列表求和。
生成器表达式实际上只返回一个可迭代对象。
sum()
获取可迭代对象并从中逐一提取值,不断求和,直到消耗完所有值。这是在 Python 中解决此问题的最有效方法。相比之下,map()
解决方案以及在sum()
调用中具有列表理解的等效解决方案必须首先构建一个列表;然后将该列表传递给 sum(),使用一次,然后丢弃。建立列表然后再次删除它的时间只是浪费。 (编辑:请注意,同时具有map
和filter
的版本必须构建两个列表,其中一个由filter
构建一个由map
构建的列表;两个列表都被丢弃。)(编辑:但在 Python 3.0 及更高版本中,map() 和 filter() 现在都是“惰性”的,并且生成一个迭代器而不是一个列表;所以这一点不再像以前那样正确了。此外,在 Python 2.x 中,您可以使用 itertools.imap() 和 itertools.ifilter() 进行基于迭代器的映射和过滤器。但我仍然更喜欢生成器表达式解决方案而不是任何映射/过滤器解决方案。)通过组合
map()
、filter()
和reduce()< /code> 与 lambda 函数结合,您可以做许多强大的事情。但 Python 有解决相同问题的惯用方法,这些方法同时性能更好,更易于阅读和理解。
Q: Why is lambda etc being pushed aside?
A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use
map()
,reduce()
, andfilter()
with functions (oftenlambda
functions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses forreduce()
.Suppose you wanted to sum a list. Here are two ways of doing it.
Sign me up as a fan of
sum()
and not a fan ofreduce()
to solve this problem. Here's another, similar problem:Not only is the
any()
solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. Thereduce()
has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way,any()
was added in Python 2.5; if you don't have it, here is a version for older versions of Python:Suppose you wanted to make a list of squares of even numbers from some list.
Now suppose you wanted to sum that list of squares.
The generator expression actually just returns an iterable object.
sum()
takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, themap()
solution, and the equivalent solution with a list comprehension inside the call tosum()
, must first build a list; this list is then passed tosum()
, used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with bothmap
andfilter
must build two lists, one built byfilter
and one built bymap
; both lists are discarded.) (EDIT: But in Python 3.0 and newer, map() and filter() are now both "lazy" and produce an iterator instead of a list; so this point is less true than it used to be. Also, in Python 2.x you were able to use itertools.imap() and itertools.ifilter() for iterator-based map and filter. But I continue to prefer the generator expression solutions over any map/filter solutions.)By composing
map()
,filter()
, andreduce()
in combination withlambda
functions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.很多人已经指出,你正在比较苹果和橙子等等。但我认为没有人展示如何进行真正简单的比较——列表理解与映射加上 lambda,几乎没有其他阻碍——而且可能是:
在这里,您可以非常清楚地看到 lambda 的成本——大约 200 微秒,在像这样的足够简单的操作的情况下,该成本淹没了操作本身。
当然,数字与过滤器非常相似,因为问题不是过滤器或映射,而是 lambda 本身:
毫无疑问,lambda 可能不太清楚,或其与斯巴达(斯巴达人)的奇怪联系他们的盾牌上画了一个 Lambda,代表“Lakedaimon”——这表明 lambda 相当独裁和血腥;-) 其慢慢过时的原因至少与它的性能成本有关。但后者却是相当真实的。
Many people have already pointed out that you're comparing apples with oranges, etc, etc. But I think nobody's shown how to a really simple comparison -- list comprehension vs map plus lambda with little else to get in the way -- and that might be:
Here, you can see very sharply the cost of lambda -- about 200 microseconds, which in the case of a sufficiently simple operation such as this one swamps the operation itself.
Numbers are very similar with filter of course, since the problem is not filter or map, but rather the lambda itself:
No doubt the fact that lambda can be less clear, or its weird connection with Sparta (Spartans had a Lambda, for "Lakedaimon", painted on their shields -- this suggests that lambda is pretty dictatorial and bloody;-) have at least as much to do with its slowly falling out of fashion, as its performance costs. But the latter are quite real.
首先,像这样进行测试:
基本上每次测试时你都会做不同的事情。例如,当您重写列表理解时,
性能将与“lambda/filter”构造相当。
First of all, test like this:
And basically you are doing different things each time you test. When you would rewrite the list-comprehension for example as
performance would be on par with the 'lambda/filter' construct.
集合是正确的解决方案。不过,尝试交换 S 和 T,看看需要多长时间!
所以你会发现 S 和 T 的顺序非常重要
更改列表理解的顺序以匹配过滤器给出
因此,如果事实上列表理解比我计算机上的 lambda 稍快
Sets are the correct solution for this. However try swapping S and T and see how long it takes!
So you see that the order of S and T are quite important
Changing the order of the list comprehension to match the filter gives
So if fact the list comprehension is slightly faster than the lambda on my computer
您的列表理解和 lambda 正在做不同的事情,与 lambda 匹配的列表理解将是
[val for val in T if val in S]
。效率并不是首选列表理解的原因(尽管它们实际上在几乎所有情况下都稍微快一些)。它们受到青睐的原因是可读性。
尝试使用较小的循环体和较大的循环,例如使 T 成为一个集合,然后迭代 S。在这种情况下,在我的机器上,列表理解速度几乎是原来的两倍。
Your list comprehension and lambda are doing different things, the list comprehension matching the lambda would be
[val for val in T if val in S]
.Efficiency isn't the reason why list comprehension are preferred (while they actually are slightly faster in almost all cases). The reason why they are preferred is readability.
Try it with smaller loop body and larger loops, like make T a set, and iterate over S. In that case on my machine the list comprehension is nearly twice as fast.
您的分析是错误的。查看 timeit 模块 并重试。
lambda
定义匿名函数。他们的主要问题是,许多人不了解整个 python 库,并使用它们来重新实现operator
、functools
等模块中已有的函数(还有很多快点 )。列表推导式与 lambda 无关。它们相当于函数式语言中的标准
filter
和map
函数。 LC 是首选,因为它们也可以用作生成器,更不用说可读性了。Your profiling is done wrong. Take a look the timeit module and try again.
lambda
defines anonymous functions. Their main problem is that many people don't know the whole python library and use them to re-implement functions that are already in theoperator
,functools
etc module ( and much faster ).List comprehensions have nothing to do with
lambda
. They are equivalent to the standardfilter
andmap
functions from functional languages. LCs are preferred because they can be used as generators too, not to mention readability.这非常快:
简单地说:比较更少,时间更少。
This is pretty fast:
Simply: less comparisions, less time.
如果您必须处理过滤结果,列表推导式可以产生更大的差异。在你的情况下,你只需构建一个列表,但如果你必须做这样的事情:
你将从 LC 优化中获益:
仅仅因为后者必须构建一个中间列表(或元组,或字符串,具体取决于S)。因此,您还会注意到每种方法对使用的内存有不同的影响,LC 将保持较低。
lambda 本身并不重要。
List comprehensions can make a bigger difference if you have to process your filtered results. In your case you just build a list, but if you had to do something like this:
you would gain from LC optimization over this:
simply because the latter has to build an intermediate list (or tuple, or string, depending on the nature of S). As a consequence you will also notice a different impact on the memory used by each method, the LC will keep lower.
The lambda in itself does not matter.