生成器表达式与列表推导式

发布于 2024-07-05 00:37:56 字数 170 浏览 18 评论 0原文

在 Python 中什么时候应该使用生成器表达式以及什么时候应该使用列表推导式？

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

原文

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

残龙傲雪 2024-07-12 00:37:56

内置 Python 函数的一些注意事项：

如果需要利用 any 的短路行为，请使用生成器表达式或全部。这些函数旨在在已知答案时停止迭代，但列表推导式必须先评估每个元素，然后才能调用函数。

例如，如果我们有

from time import sleep
def long_calculation(value):
    sleep(1) # for simulation purposes
    return value == 1

any([long_calculation(x) for x in range(10)]) 大约需要十秒，因为 long_calculation 将为每个 <代码>x。 any(long_calculation(x) for x in range(10)) 只需大约两秒，因为 long_calculation 只能通过 0 调用，并且1 输入。

当 any 和 all 迭代列表理解时，它们仍然会停止检查的元素一旦知道答案，真实性（一旦 any 发现真实结果，或 all 发现错误结果）； 但是，与理解所做的实际工作相比，这通常是微不足道的。

当可以使用生成器表达式时，它们当然具有更高的内存效率。使用非短路 min、max 和 sum（min 的计时），列表推导将稍微更快。此处显示代码>最大）：

$ python -m timeit "max(_ for _ in range(1))"
500000 loops, best of 5: 476 nsec per loop
$ python -m timeit "max([_ for _ in range(1)])"
500000 loops, best of 5: 425 nsec per loop
$ python -m timeit "max(_ for _ in range(100))"
50000 loops, best of 5: 4.42 usec per loop
$ python -m timeit "max([_ for _ in range(100)])"
100000 loops, best of 5: 3.79 usec per loop
$ python -m timeit "max(_ for _ in range(10000))"
500 loops, best of 5: 468 usec per loop
$ python -m timeit "max([_ for _ in range(10000)])"
500 loops, best of 5: 442 usec per loop

Some notes for built-in Python functions:

Use a generator expression if you need to exploit the short-circuiting behaviour of any or all. These functions are designed to stop iterating when the answer is known, but a list comprehension must evaluate every element before the function can be called.

For example, if we have

from time import sleep
def long_calculation(value):
    sleep(1) # for simulation purposes
    return value == 1

then any([long_calculation(x) for x in range(10)]) takes about ten seconds, as long_calculation will be called for every x. any(long_calculation(x) for x in range(10)) takes only about two seconds, since long_calculation will only be called with 0 and 1 inputs.

When any and all iterate over the list comprehension, they will still stop checking elements for truthiness once an answer is known (as soon as any finds a true result, or all finds a false one); however, this is usually trivial compared to the actual work done by the comprehension.

Generator expressions are of course more memory efficient, when it's possible to use them. List comprehensions will be slightly faster with the non-short-circuiting min, max and sum (timings for max shown here):

$ python -m timeit "max(_ for _ in range(1))"
500000 loops, best of 5: 476 nsec per loop
$ python -m timeit "max([_ for _ in range(1)])"
500000 loops, best of 5: 425 nsec per loop
$ python -m timeit "max(_ for _ in range(100))"
50000 loops, best of 5: 4.42 usec per loop
$ python -m timeit "max([_ for _ in range(100)])"
100000 loops, best of 5: 3.79 usec per loop
$ python -m timeit "max(_ for _ in range(10000))"
500 loops, best of 5: 468 usec per loop
$ python -m timeit "max([_ for _ in range(10000)])"
500 loops, best of 5: 442 usec per loop

回复收藏 0 原文

谁把谁当真 2024-07-12 00:37:56

有时，您可以使用 itertoolstee 函数a>，它为同一个生成器返回多个可以独立使用的迭代器。

回复收藏 0 原文

我做我的改变 2024-07-12 00:37:56

我正在使用 Hadoop Mincemeat 模块。我认为这是一个值得注意的很好的例子：

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

这里生成器从文本文件（大至 15GB）中获取数字，并使用 Hadoop 的 map-reduce 对这些数字应用简单的数学运算。如果我没有使用yield函数，而是使用列表理解，那么计算总和和平均值会花费更长的时间（更不用说空间复杂度了）。

Hadoop 是利用 Generator 的所有优点的一个很好的例子。

I'm using the Hadoop Mincemeat module. I think this is a great example to take a note of:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop's map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

Hadoop is a great example for using all the advantages of Generators.

回复收藏 0 原文

留一抹残留的笑 2024-07-12 00:37:56

列表推导式是急切的，但生成器是惰性的。

在列表推导式中，所有对象都会立即创建，创建和返回列表需要更长的时间。在生成器表达式中，对象创建会延迟到 next() 请求为止。在 next() 生成器对象被创建并立即返回。

列表推导式中的迭代速度更快，因为对象已经创建。

如果迭代列表理解和生成器表达式中的所有元素，时间性能大致相同。即使生成器表达式立即返回生成器对象，它也不会创建所有元素。每次迭代一个新元素时，它都会创建并返回它。

但如果不迭代所有元素，生成器的效率会更高。假设您需要创建一个包含数百万个项目的列表推导式，但您只使用其中的 10 个。您仍然需要创建数百万个项目。您只是在浪费时间进行数百万次计算来创建数百万个项目，但只使用 10 个。或者，如果您发出数百万个 api 请求，但最终只使用了其中的 10 个。由于生成器表达式是惰性的，因此除非有请求，否则它不会进行所有计算或 api 调用。在这种情况下，使用生成器表达式会更有效。

在列表推导式中，整个集合被加载到内存中。但是生成器表达式，一旦它在 next() 调用时返回一个值，它就完成了，并且不需要再将其存储在内存中。仅将单个项目加载到内存中。如果您正在迭代磁盘中的一个大文件，如果文件太大，您可能会遇到内存问题。在这种情况下，使用生成器表达式会更有效。

回复收藏 0 原文

初心未许 2024-07-12 00:37:56

我认为大多数答案都忽略了一些事情。列表理解基本上创建一个列表并将其添加到堆栈中。如果列表对象非常大，您的脚本进程将被终止。在这种情况下，生成器会更受欢迎，因为它的值不存储在内存中，而是存储为有状态函数。还有创作速度；列表理解比生成器理解慢

简而言之，；
当 obj 的大小不是太大时使用列表理解，否则使用生成器理解

回复收藏 0 原文

独自唱情﹋歌 2024-07-12 00:37:56

对于函数式编程，我们希望使用尽可能少的索引。因此，如果我们想在获取第一个元素切片后继续使用这些元素，islice() 是更好的选择，因为保存了迭代器状态。

from itertools import islice

def slice_and_continue(sequence):
    ret = []
    seq_i = iter(sequence) #create an iterator from the list

    seq_slice = islice(seq_i,3) #take first 3 elements and print
    for x in seq_slice: print(x),

    for x in seq_i: print(x**2), #square the rest of the numbers

slice_and_continue([1,2,3,4,5])

输出：1 2 3 16 25

For functional programming, we want to use as little indexing as possible. For this reason, If we want to continue using the elements after we take the first slice of elements, islice() is a better choice since the iterator state is saved.

from itertools import islice

def slice_and_continue(sequence):
    ret = []
    seq_i = iter(sequence) #create an iterator from the list

    seq_slice = islice(seq_i,3) #take first 3 elements and print
    for x in seq_slice: print(x),

    for x in seq_i: print(x**2), #square the rest of the numbers

slice_and_continue([1,2,3,4,5])

output: 1 2 3 16 25

回复收藏 0 原文

鹿港巷口少年归 2024-07-12 00:37:56

约翰的回答很好（当您想要多次迭代某些内容时，列表理解会更好）。但是，还值得注意的是，如果您想使用任何列表方法，则应该使用列表。例如，以下代码将不起作用：

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

基本上，如果您所做的只是迭代一次，请使用生成器表达式。如果您想存储和使用生成的结果，那么使用列表理解可能会更好。

由于性能是选择其中之一的最常见原因，因此我的建议是不要担心它，只选择一个；如果你发现你的程序运行得太慢，那么你才应该回去担心调整你的代码。

John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.

回复收藏 0 原文

飘过的浮云 2024-07-12 00:37:56

迭代生成器表达式或列表理解将执行相同的操作。但是，列表理解将首先在内存中创建整个列表，而生成器表达式将动态创建项目，因此您可以将其用于非常大的（并且也是无限的！）序列。

回复收藏 0 原文

万劫不复 2024-07-12 00:37:56

当结果需要多次迭代或速度至关重要时，请使用列表推导式。在范围较大或无限的情况下使用生成器表达式。

请参阅生成器表达式和列表推导式更多信息。

回复收藏 0 原文

猫腻 2024-07-12 00:37:56

重要的一点是列表理解创建一个新列表。生成器创建一个可迭代对象，它将在您使用这些位时动态“过滤”源材料。

假设您有一个名为“hugefile.txt”的 2TB 日志文件，并且您需要以“ENTRY”一词开头的所有行的内容和长度。

因此，您尝试从编写列表理解开始：

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

这会吸收整个文件，处理每一行，并将匹配的行存储在数组中。因此，该数组最多可包含 2TB 的内容。这是很大的内存，可能不适合您的用途。

因此，我们可以使用生成器对我们的内容应用“过滤器”。在我们开始迭代结果之前，不会实际读取任何数据。

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

甚至还没有从我们的文件中读取任何一行。事实上，假设我们想要进一步过滤我们的结果：

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

仍然没有读取任何内容，但我们现在指定了两个生成器，它们将按照我们的意愿对我们的数据进行操作。

让我们将过滤后的行写入另一个文件：

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

现在我们读取输入文件。当我们的 for 循环继续请求额外的行时，long_entries 生成器需要来自 entry_lines 生成器的行，仅返回长度大于 80 的行人物。反过来，entry_lines 生成器从 logfile 迭代器请求行（按指示进行过滤），而日志文件迭代器又读取文件。

因此，您不是以完全填充的列表的形式将数据“推送”到输出函数，而是为输出函数提供一种仅在需要时“拉取”数据的方法。在我们的例子中，这效率更高，但不够灵活。生成器是一种方式，一次通过；我们读取的日志文件中的数据会立即被丢弃，因此我们无法返回到上一行。另一方面，一旦我们处理完数据，我们就不必担心如何保留数据。

The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will "filter" the source material on-the-fly as you consume the bits.

Imagine you have a 2TB log file called "hugefile.txt", and you want the content and length for all the lines that start with the word "ENTRY".

So you try starting out by writing a list comprehension:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That's a lot of RAM, and probably not practical for your purposes.

So instead we can use a generator to apply a "filter" to our content. No data is actually read until we start iterating over the result.

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

Still nothing has been read, but we've specified now two generators that will act on our data as we wish.

Lets write out our filtered lines to another file:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

So instead of "pushing" data to your output function in the form of a fully-populated list, you're giving the output function a way to "pull" data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we've read gets immediately discarded, so we can't go back to a previous line. On the other hand, we don't have to worry about keeping data around once we're done with it.

回复收藏 0 原文

蓝梦月影 2024-07-12 00:37:56

生成器表达式的好处是它使用更少的内存，因为它不会立即构建整个列表。当列表是中介时，最好使用生成器表达式，例如对结果求和，或从结果中创建字典。

例如：

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

这样做的优点是列表没有完全生成，因此使用的内存很少（而且应该更快），

但是，当所需的最终产品是列表时，您应该使用列表推导式。您不会使用生成器表达式保存任何内存，因为您需要生成的列表。您还可以获得能够使用任何列表功能（例如排序或反转）的好处。

例如：

reversed( [x*2 for x in xrange(256)] )

The benefit of a generator expression is that it uses less memory since it doesn't build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.

For example:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

The advantage there is that the list isn't completely generated, and thus little memory is used (and should also be faster)

You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.

For example:

reversed( [x*2 for x in xrange(256)] )

回复收藏 0 原文

×眷恋的温暖 2024-07-12 00:37:56

Python 3.7：

列表推导更快。

生成器的内存效率更高。

正如所有其他人所说，如果您希望扩展无限数据，您最终将需要一个生成器。对于需要速度的相对静态的中小型工作，列表理解是最好的。

回复收藏 0 原文

梦年海沫深 2024-07-12 00:37:56

列表的状态进行评估，而不是在创建生成器时：

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

当从可变对象（如列表）创建生成器时，请注意生成器将在使用生成器时根据您的列表被修改（或该列表中的可变对象），但您需要创建生成器时的状态，您需要使用列表理解来代替。

When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.

回复收藏 0 原文

~没有更多了~