Python:生成器表达式与yield

发布于 2024-08-16 15:17:53 字数 432 浏览 7 评论 0原文

在Python中,通过生成器表达式创建生成器对象与使用yield语句创建生成器对象有什么区别吗?

使用yield

def Generator(x, y):
    for i in xrange(x):
        for j in xrange(y):
            yield(i, j)

使用生成器表达式

def Generator(x, y):
    return ((i, j) for i in xrange(x) for j in xrange(y))

两个函数都返回生成器对象,生成元组,例如(0,0)、(0,1)等。

其中之一或的任何优点另一个?想法?

In Python, is there any difference between creating a generator object through a generator expression versus using the yield statement?

Using yield:

def Generator(x, y):
    for i in xrange(x):
        for j in xrange(y):
            yield(i, j)

Using generator expression:

def Generator(x, y):
    return ((i, j) for i in xrange(x) for j in xrange(y))

Both functions return generator objects, which produce tuples, e.g. (0,0), (0,1) etc.

Any advantages of one or the other? Thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

做个ˇ局外人 2024-08-23 15:17:53

两者之间只有细微的差别。您可以使用 dis 模块亲自检查此类事情。

编辑:我的第一个版本反编译了交互式提示中在模块范围内创建的生成器表达式。这与在函数内部使用的 OP 版本略有不同。我对此进行了修改以匹配问题中的实际情况。

如下所示,“yield”生成器(第一种情况)在设置中具有三个额外的指令,但与第一个 FOR_ITER 相比,它们仅在一个方面有所不同:“yield”方法使用 < code>LOAD_FAST 代替循环内的 LOAD_DEREFLOAD_DEREF“相当慢”< /a> 比 LOAD_FAST 快,因此对于足够大的 x 值(外循环),它使“yield”版本比生成器表达式稍快,因为 < 的值code>y 每次传递的加载速度稍快一些。对于较小的 x 值,由于设置代码的额外开销,速度会稍微慢一些。

还值得指出的是,生成器表达式通常会在代码中内联使用,而不是像那样用函数包装它。这将消除一些设置开销,并使生成器表达式对于较小的循环值稍微更快,即使 LOAD_FAST 为“yield”版本提供了优势。

在这两种情况下,性能差异都不足以证明选择其中一种是合理的。可读性更重要,因此请根据当前情况使用最可读的内容。

>>> def Generator(x, y):
...     for i in xrange(x):
...         for j in xrange(y):
...             yield(i, j)
...
>>> dis.dis(Generator)
  2           0 SETUP_LOOP              54 (to 57)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_FAST                0 (x)
              9 CALL_FUNCTION            1
             12 GET_ITER
        >>   13 FOR_ITER                40 (to 56)
             16 STORE_FAST               2 (i)

  3          19 SETUP_LOOP              31 (to 53)
             22 LOAD_GLOBAL              0 (xrange)
             25 LOAD_FAST                1 (y)
             28 CALL_FUNCTION            1
             31 GET_ITER
        >>   32 FOR_ITER                17 (to 52)
             35 STORE_FAST               3 (j)

  4          38 LOAD_FAST                2 (i)
             41 LOAD_FAST                3 (j)
             44 BUILD_TUPLE              2
             47 YIELD_VALUE
             48 POP_TOP
             49 JUMP_ABSOLUTE           32
        >>   52 POP_BLOCK
        >>   53 JUMP_ABSOLUTE           13
        >>   56 POP_BLOCK
        >>   57 LOAD_CONST               0 (None)
             60 RETURN_VALUE
>>> def Generator_expr(x, y):
...    return ((i, j) for i in xrange(x) for j in xrange(y))
...
>>> dis.dis(Generator_expr.func_code.co_consts[1])
  2           0 SETUP_LOOP              47 (to 50)
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                40 (to 49)
              9 STORE_FAST               1 (i)
             12 SETUP_LOOP              31 (to 46)
             15 LOAD_GLOBAL              0 (xrange)
             18 LOAD_DEREF               0 (y)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                17 (to 45)
             28 STORE_FAST               2 (j)
             31 LOAD_FAST                1 (i)
             34 LOAD_FAST                2 (j)
             37 BUILD_TUPLE              2
             40 YIELD_VALUE
             41 POP_TOP
             42 JUMP_ABSOLUTE           25
        >>   45 POP_BLOCK
        >>   46 JUMP_ABSOLUTE            6
        >>   49 POP_BLOCK
        >>   50 LOAD_CONST               0 (None)
             53 RETURN_VALUE

There are only slight differences in the two. You can use the dis module to examine this sort of thing for yourself.

Edit: My first version decompiled the generator expression created at module-scope in the interactive prompt. That's slightly different from the OP's version with it used inside a function. I've modified this to match the actual case in the question.

As you can see below, the "yield" generator (first case) has three extra instructions in the setup, but from the first FOR_ITER they differ in only one respect: the "yield" approach uses a LOAD_FAST in place of a LOAD_DEREF inside the loop. The LOAD_DEREF is "rather slower" than LOAD_FAST, so it makes the "yield" version slightly faster than the generator expression for large enough values of x (the outer loop) because the value of y is loaded slightly faster on each pass. For smaller values of x it would be slightly slower because of the extra overhead of the setup code.

It might also be worth pointing out that the generator expression would usually be used inline in the code, rather than wrapping it with the function like that. That would remove a bit of the setup overhead and keep the generator expression slightly faster for smaller loop values even if LOAD_FAST gave the "yield" version an advantage otherwise.

In neither case would the performance difference be enough to justify deciding between one or the other. Readability counts far more, so use whichever feels most readable for the situation at hand.

>>> def Generator(x, y):
...     for i in xrange(x):
...         for j in xrange(y):
...             yield(i, j)
...
>>> dis.dis(Generator)
  2           0 SETUP_LOOP              54 (to 57)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_FAST                0 (x)
              9 CALL_FUNCTION            1
             12 GET_ITER
        >>   13 FOR_ITER                40 (to 56)
             16 STORE_FAST               2 (i)

  3          19 SETUP_LOOP              31 (to 53)
             22 LOAD_GLOBAL              0 (xrange)
             25 LOAD_FAST                1 (y)
             28 CALL_FUNCTION            1
             31 GET_ITER
        >>   32 FOR_ITER                17 (to 52)
             35 STORE_FAST               3 (j)

  4          38 LOAD_FAST                2 (i)
             41 LOAD_FAST                3 (j)
             44 BUILD_TUPLE              2
             47 YIELD_VALUE
             48 POP_TOP
             49 JUMP_ABSOLUTE           32
        >>   52 POP_BLOCK
        >>   53 JUMP_ABSOLUTE           13
        >>   56 POP_BLOCK
        >>   57 LOAD_CONST               0 (None)
             60 RETURN_VALUE
>>> def Generator_expr(x, y):
...    return ((i, j) for i in xrange(x) for j in xrange(y))
...
>>> dis.dis(Generator_expr.func_code.co_consts[1])
  2           0 SETUP_LOOP              47 (to 50)
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                40 (to 49)
              9 STORE_FAST               1 (i)
             12 SETUP_LOOP              31 (to 46)
             15 LOAD_GLOBAL              0 (xrange)
             18 LOAD_DEREF               0 (y)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                17 (to 45)
             28 STORE_FAST               2 (j)
             31 LOAD_FAST                1 (i)
             34 LOAD_FAST                2 (j)
             37 BUILD_TUPLE              2
             40 YIELD_VALUE
             41 POP_TOP
             42 JUMP_ABSOLUTE           25
        >>   45 POP_BLOCK
        >>   46 JUMP_ABSOLUTE            6
        >>   49 POP_BLOCK
        >>   50 LOAD_CONST               0 (None)
             53 RETURN_VALUE
看海 2024-08-23 15:17:53

在这个例子中,事实并非如此。但是 yield 可用于更复杂的构造 - 例如它也可以接受来自调用者的值并因此修改流程。阅读 PEP 342 了解更多详细信息(这是一项值得了解的有趣技术)。

无论如何,最好的建议是使用更清楚地满足您需求的东西

PS 这是一个来自 Dave Beazley 的简单协程示例:

def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = (yield)
        if pattern in line:
            print line,

# Example use
if __name__ == '__main__':
    g = grep("python")
    g.next()
    g.send("Yeah, but no, but yeah, but no")
    g.send("A series of tubes")
    g.send("python generators rock!")

In this example, not really. But yield can be used for more complex constructs - for example it can accept values from the caller as well and modify the flow as a result. Read PEP 342 for more details (it's an interesting technique worth knowing).

Anyway, the best advice is use whatever is clearer for your needs.

P.S. Here's a simple coroutine example from Dave Beazley:

def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = (yield)
        if pattern in line:
            print line,

# Example use
if __name__ == '__main__':
    g = grep("python")
    g.next()
    g.send("Yeah, but no, but yeah, but no")
    g.send("A series of tubes")
    g.send("python generators rock!")
苍景流年 2024-08-23 15:17:53

对于可以放入生成器表达式的简单循环类型来说,没有什么区别。然而,yield 可用于创建执行更复杂处理的生成器。这是生成斐波那契数列的简单示例:

>>> def fibgen():
...    a = b = 1
...    while True:
...        yield a
...        a, b = b, a+b

>>> list(itertools.takewhile((lambda x: x<100), fibgen()))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

There is no difference for the kind of simple loops that you can fit into a generator expression. However yield can be used to create generators that do much more complex processing. Here is a simple example for generating the fibonacci sequence:

>>> def fibgen():
...    a = b = 1
...    while True:
...        yield a
...        a, b = b, a+b

>>> list(itertools.takewhile((lambda x: x<100), fibgen()))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
A君 2024-08-23 15:17:53

在使用中,请注意生成器对象与生成器函数之间的区别。

生成器对象只能使用一次,与生成器函数相反,生成器函数可以在每次再次调用时重复使用,因为它返回一个新的生成器对象。

生成器表达式在实践中通常“原始”使用,而不将它们包装在函数中,并且它们返回一个生成器对象。

例如:

def range_10_gen_func():
    x = 0
    while x < 10:
        yield x
        x = x + 1

print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))

哪个输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

与稍微不同的用法进行比较:

range_10_gen = range_10_gen_func()
print(list(range_10_gen))
print(list(range_10_gen))
print(list(range_10_gen))

哪个输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]

并与生成器表达式进行比较:

range_10_gen_expr = (x for x in range(10))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))

也输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]

In usage, note a distinction between a generator object vs a generator function.

A generator object is use-once-only, in contrast to a generator function, which can be reused each time you call it again, because it returns a fresh generator object.

Generator expressions are in practice usually used "raw", without wrapping them in a function, and they return a generator object.

E.g.:

def range_10_gen_func():
    x = 0
    while x < 10:
        yield x
        x = x + 1

print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))

which outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Compare with a slightly different usage:

range_10_gen = range_10_gen_func()
print(list(range_10_gen))
print(list(range_10_gen))
print(list(range_10_gen))

which outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]

And compare with a generator expression:

range_10_gen_expr = (x for x in range(10))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))

which also outputs:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
甜心 2024-08-23 15:17:53

是的,有区别。

对于生成器表达式(x for var in expr),在创建表达式时调用iter(expr)

当使用defyield创建生成器时,如:

def my_generator():
    for var in expr:
        yield x

g = my_generator()

iter(expr)尚未被调用。仅当迭代 g 时才会调用它(并且可能根本不会被调用)。

以这个迭代器为例:

from __future__ import print_function


class CountDown(object):
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        print("ITER")
        return self

    def __next__(self):
        if self.n == 0:
            raise StopIteration()
        self.n -= 1
        return self.n

    next = __next__  # for python2

这段代码:

g1 = (i ** 2 for i in CountDown(3))  # immediately prints "ITER"
print("Go!")
for x in g1:
    print(x)

while:

def my_generator():
    for i in CountDown(3):
        yield i ** 2


g2 = my_generator()
print("Go!")
for x in g2:  # "ITER" is only printed here
    print(x)

由于大多数迭代器不会在 __iter__ 中做很多事情,所以很容易错过这个行为。一个现实世界的例子是 Django 的 QuerySet,它 __iter__ 中获取数据和 data = (f(x) for x in qs) 可能需要很多时间,而 def g(): for x in qs: yield f(x) 后跟 data=g() 将立即返回。

有关更多信息和正式定义,请参阅 PEP 289 -- 生成器表达式

Yes there is a difference.

For the generator expression (x for var in expr), iter(expr) is called when the expression is created.

When using def and yield to create a generator, as in:

def my_generator():
    for var in expr:
        yield x

g = my_generator()

iter(expr) is not yet called. It will be called only when iterating on g (and might not be called at all).

Taking this iterator as an example:

from __future__ import print_function


class CountDown(object):
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        print("ITER")
        return self

    def __next__(self):
        if self.n == 0:
            raise StopIteration()
        self.n -= 1
        return self.n

    next = __next__  # for python2

This code:

g1 = (i ** 2 for i in CountDown(3))  # immediately prints "ITER"
print("Go!")
for x in g1:
    print(x)

while:

def my_generator():
    for i in CountDown(3):
        yield i ** 2


g2 = my_generator()
print("Go!")
for x in g2:  # "ITER" is only printed here
    print(x)

Since most iterators do not do a lot of stuff in __iter__, it is easy to miss this behavior. A real world example would be Django's QuerySet, which fetch data in __iter__ and data = (f(x) for x in qs) might take a lot of time, while def g(): for x in qs: yield f(x) followed by data=g() would return immediately.

For more info and the formal definition refer to PEP 289 -- Generator Expressions.

猥琐帝 2024-08-23 15:17:53

如果表达式比嵌套循环更复杂,那么使用 yield 会很好。除此之外,您还可以返回特殊的第一个值或特殊的最后一个值。考虑:

def Generator(x):
  for i in xrange(x):
    yield(i)
  yield(None)

Using yield is nice if the expression is more complicated than just nested loops. Among other things you can return a special first or special last value. Consider:

def Generator(x):
  for i in xrange(x):
    yield(i)
  yield(None)
迷途知返 2024-08-23 15:17:53

当考虑迭代器时,itertools 模块:

...标准化了一组快速、内存高效的核心工具,这些工具本身或组合起来很有用。它们共同构成了“迭代器代数”,使得在纯 Python 中简洁高效地构建专用工具成为可能。

对于性能,请考虑 itertools.product(*iterables[, Repeat])< /代码>

输入可迭代对象的笛卡尔积。

相当于生成器表达式中的嵌套 for 循环。例如,product(A, B) 返回的结果与 ((x,y) for x in A for y in B) 相同。

>>> import itertools
>>> def gen(x,y):
...     return itertools.product(xrange(x),xrange(y))
... 
>>> [t for t in gen(3,2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
>>> 

When thinking about iterators, the itertools module:

... standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.

For performance, consider itertools.product(*iterables[, repeat])

Cartesian product of input iterables.

Equivalent to nested for-loops in a generator expression. For example, product(A, B) returns the same as ((x,y) for x in A for y in B).

>>> import itertools
>>> def gen(x,y):
...     return itertools.product(xrange(x),xrange(y))
... 
>>> [t for t in gen(3,2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
>>> 
得不到的就毁灭 2024-08-23 15:17:53

在某些情况下,存在一个可能很重要但尚未指出的差异。使用 yield 可以防止您将 return 用于 隐式引发 StopIteration(和协程相关的东西)

这意味着这段代码的格式不正确(将其提供给解释器将会给你一个AttributeError):

class Tea:

    """With a cloud of milk, please"""

    def __init__(self, temperature):
        self.temperature = temperature

def mary_poppins_purse(tea_time=False):
    """I would like to make one thing clear: I never explain anything."""
    if tea_time:
        return Tea(355)
    else:
        for item in ['lamp', 'mirror', 'coat rack', 'tape measure', 'ficus']:
            yield item

print(mary_poppins_purse(True).temperature)

另一方面,这段代码的工作原理就像一个魅力:

class Tea:

    """With a cloud of milk, please"""

    def __init__(self, temperature):
        self.temperature = temperature

def mary_poppins_purse(tea_time=False):
    """I would like to make one thing clear: I never explain anything."""
    if tea_time:
        return Tea(355)
    else:
        return (item for item in ['lamp', 'mirror', 'coat rack',
                                  'tape measure', 'ficus'])

print(mary_poppins_purse(True).temperature)

There is a difference that could be important in some contexts that hasn't been pointed out yet. Using yield prevents you from using return for something else than implicitly raising StopIteration (and coroutines related stuff).

This means this code is ill-formed (and feeding it to an interpreter will give you an AttributeError):

class Tea:

    """With a cloud of milk, please"""

    def __init__(self, temperature):
        self.temperature = temperature

def mary_poppins_purse(tea_time=False):
    """I would like to make one thing clear: I never explain anything."""
    if tea_time:
        return Tea(355)
    else:
        for item in ['lamp', 'mirror', 'coat rack', 'tape measure', 'ficus']:
            yield item

print(mary_poppins_purse(True).temperature)

On the other hand, this code works like a charm:

class Tea:

    """With a cloud of milk, please"""

    def __init__(self, temperature):
        self.temperature = temperature

def mary_poppins_purse(tea_time=False):
    """I would like to make one thing clear: I never explain anything."""
    if tea_time:
        return Tea(355)
    else:
        return (item for item in ['lamp', 'mirror', 'coat rack',
                                  'tape measure', 'ficus'])

print(mary_poppins_purse(True).temperature)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文