Python 的 [<生成器表达式>] 至少比 list(<生成器表达式>) 快 3 倍?生成器表达式>生成器表达式>
看起来,在生成器表达式周围使用 [] (test1) 的行为比将其放在 list() (test2) 中要好得多。当我简单地将列表传递到 list() 进行浅复制(test3)时,速度并没有减慢。这是为什么呢?
证据:
from timeit import Timer
t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")
x = [34534534, 23423523, 77645645, 345346]
def test1():
[e for e in x]
print t1.timeit()
#0.552290201187
def test2():
list(e for e in x)
print t2.timeit()
#2.38739395142
def test3():
list(x)
print t3.timeit()
#0.515818119049
机器:64 位 AMD、Ubuntu 8.04、Python 2.7 (r27:82500)
It appears that using [] around a generator expression (test1) behaves substantially better than putting it inside of list() (test2). The slowdown isn't there when I simply pass a list into list() for shallow copy (test3). Why is this?
Evidence:
from timeit import Timer
t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")
x = [34534534, 23423523, 77645645, 345346]
def test1():
[e for e in x]
print t1.timeit()
#0.552290201187
def test2():
list(e for e in x)
print t2.timeit()
#2.38739395142
def test3():
list(x)
print t3.timeit()
#0.515818119049
Machine: 64 bit AMD, Ubuntu 8.04, Python 2.7 (r27:82500)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
好吧,我的第一步是独立设置两个测试,以确保这不是函数定义顺序等导致的结果。
果然,我可以复制这个。好的,下一步是查看字节码以了解实际发生的情况:
请注意,第一个方法直接创建列表,而第二个方法创建一个 genexpr 对象并将其传递给全局
列表
。这可能就是开销所在。另请注意,差异大约为一微秒,即完全微不足道。
其他有趣的数据
这仍然适用于非平凡的列表
和不那么平凡的映射函数:
并且(尽管不那么强烈)如果我们过滤列表:
Well, my first step was to set the two tests up independently to ensure that this is not a result of e.g. the order in which the functions are defined.
Sure enough, I can replicate this. OK, next step is to have a look at the bytecode to see what's actually going on:
Notice that the first method creates the list directly, whereas the second method creates a
genexpr
object and passes that to the globallist
. This is probably where the overhead lies.Note also that the difference is approximately a microsecond i.e. utterly trivial.
Other interesting data
This still holds for non-trivial lists
and for less trivial map functions:
and (though less strongly) if we filter the list:
list(e for e in x)
不是列表理解,它是一个正在创建的genexpr
对象(e for e in x)
传递给list
工厂函数。据推测,对象创建和方法调用会产生开销。list(e for e in x)
isn't a list comprehension, it's agenexpr
object(e for e in x)
being created and passed to thelist
factory function. Presumably the object creation and method calls create overhead.在 python
list
中,名称必须在模块中查找,然后在内置函数中查找。虽然您无法更改列表理解的含义,但这意味着列表调用必须只是标准查找+函数调用,因为它可以重新定义为其他内容。查看为理解而生成的 vm 代码,可以看出它是内联的,而对 list 的调用是普通调用。
In python
list
name must be looked up in the module and then in builtins. While you cannot change what a list comprehension means a list call must just be a standard lookup + function call as it could be redefined to be something else.Looking at the vm code generated for a comprehension it can be seen that it is inlined while a call to list is a normal call.
您的 test2 大致相当于:
调用开销解释了处理时间的增加。
Your test2 is roughly equivalent to:
The call overhead explains the increased processing time.