Python 的 [<生成器表达式>] 至少比 list(<生成器表达式>) 快 3 倍？

发布于 2024-10-06 21:42:43 字数 632 浏览 3 评论 0原文

看起来，在生成器表达式周围使用 [] (test1) 的行为比将其放在 list() (test2) 中要好得多。当我简单地将列表传递到 list() 进行浅复制（test3）时，速度并没有减慢。这是为什么呢？

证据：

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")

x = [34534534, 23423523, 77645645, 345346]

def test1():
    [e for e in x]

print t1.timeit()
#0.552290201187


def test2():
    list(e for e in x)

print t2.timeit()
#2.38739395142

def test3():
    list(x)

print t3.timeit()
#0.515818119049

机器：64 位 AMD、Ubuntu 8.04、Python 2.7 (r27:82500)

原文

It appears that using [] around a generator expression (test1) behaves substantially better than putting it inside of list() (test2). The slowdown isn't there when I simply pass a list into list() for shallow copy (test3). Why is this?

Evidence:

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")

x = [34534534, 23423523, 77645645, 345346]

def test1():
    [e for e in x]

print t1.timeit()
#0.552290201187


def test2():
    list(e for e in x)

print t2.timeit()
#2.38739395142

def test3():
    list(x)

print t3.timeit()
#0.515818119049

Machine: 64 bit AMD, Ubuntu 8.04, Python 2.7 (r27:82500)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

将军与妓 2024-10-13 21:42:43

好吧，我的第一步是独立设置两个测试，以确保这不是函数定义顺序等导致的结果。

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop

果然，我可以复制这个。好的，下一步是查看字节码以了解实际发生的情况：

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE

请注意，第一个方法直接创建列表，而第二个方法创建一个 genexpr 对象并将其传递给全局列表。这可能就是开销所在。

另请注意，差异大约为一微秒，即完全微不足道。

其他有趣的数据

这仍然适用于非平凡的列表

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop

和不那么平凡的映射函数：

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop

并且（尽管不那么强烈）如果我们过滤列表：

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop

Well, my first step was to set the two tests up independently to ensure that this is not a result of e.g. the order in which the functions are defined.

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop

Sure enough, I can replicate this. OK, next step is to have a look at the bytecode to see what's actually going on:

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE

Notice that the first method creates the list directly, whereas the second method creates a genexpr object and passes that to the global list. This is probably where the overhead lies.

Note also that the difference is approximately a microsecond i.e. utterly trivial.

Other interesting data

This still holds for non-trivial lists

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop

and for less trivial map functions:

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop

and (though less strongly) if we filter the list:

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop

回复收藏 0 原文

情丝乱 2024-10-13 21:42:43

list(e for e in x) 不是列表理解，它是一个正在创建的 genexpr 对象 (e for e in x)传递给 list 工厂函数。据推测，对象创建和方法调用会产生开销。

回复收藏 0 原文

不如归去 2024-10-13 21:42:43

在 python list 中，名称必须在模块中查找，然后在内置函数中查找。虽然您无法更改列表理解的含义，但这意味着列表调用必须只是标准查找+函数调用，因为它可以重新定义为其他内容。

查看为理解而生成的 vm 代码，可以看出它是内联的，而对 list 的调用是普通调用。

>>> import dis
>>> def foo():
...     [x for x in xrange(4)]
... 
>>> dis.dis(foo)
  2           0 BUILD_LIST               0
              3 DUP_TOP             
              4 STORE_FAST               0 (_[1])
              7 LOAD_GLOBAL              0 (xrange)
             10 LOAD_CONST               1 (4)
             13 CALL_FUNCTION            1
             16 GET_ITER            
        >>   17 FOR_ITER                13 (to 33)
             20 STORE_FAST               1 (x)
             23 LOAD_FAST                0 (_[1])
             26 LOAD_FAST                1 (x)
             29 LIST_APPEND         
             30 JUMP_ABSOLUTE           17
        >>   33 DELETE_FAST              0 (_[1])
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

>>> def bar():
...     list(x for x in xrange(4))
... 
>>> dis.dis(bar)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x7fd1230cf468, file "<stdin>", line 2>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (xrange)
             12 LOAD_CONST               2 (4)
             15 CALL_FUNCTION            1
             18 GET_ITER            
             19 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             25 POP_TOP             
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE

In python list name must be looked up in the module and then in builtins. While you cannot change what a list comprehension means a list call must just be a standard lookup + function call as it could be redefined to be something else.

Looking at the vm code generated for a comprehension it can be seen that it is inlined while a call to list is a normal call.

>>> import dis
>>> def foo():
...     [x for x in xrange(4)]
... 
>>> dis.dis(foo)
  2           0 BUILD_LIST               0
              3 DUP_TOP             
              4 STORE_FAST               0 (_[1])
              7 LOAD_GLOBAL              0 (xrange)
             10 LOAD_CONST               1 (4)
             13 CALL_FUNCTION            1
             16 GET_ITER            
        >>   17 FOR_ITER                13 (to 33)
             20 STORE_FAST               1 (x)
             23 LOAD_FAST                0 (_[1])
             26 LOAD_FAST                1 (x)
             29 LIST_APPEND         
             30 JUMP_ABSOLUTE           17
        >>   33 DELETE_FAST              0 (_[1])
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

>>> def bar():
...     list(x for x in xrange(4))
... 
>>> dis.dis(bar)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x7fd1230cf468, file "<stdin>", line 2>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (xrange)
             12 LOAD_CONST               2 (4)
             15 CALL_FUNCTION            1
             18 GET_ITER            
             19 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             25 POP_TOP             
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE

回复收藏 0 原文

酒儿 2024-10-13 21:42:43

您的 test2 大致相当于：

def test2():
    def local():
        for i in x:
            yield i
    return list(local())

调用开销解释了处理时间的增加。

Your test2 is roughly equivalent to:

def test2():
    def local():
        for i in x:
            yield i
    return list(local())

The call overhead explains the increased processing time.

回复收藏 0 原文

~没有更多了~

关于作者

子栖

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

Python 的 [<生成器表达式>] 至少比 list(<生成器表达式>) 快 3 倍？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

其他有趣的数据

Other interesting data

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

Python 的 [<生成器表达式>] 至少比 list(<生成器表达式>) 快 3 倍？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

其他有趣的数据

Other interesting data

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。