* 解包是否在生成器表达式上使用内存?

发布于 2025-01-12 07:26:50 字数 344 浏览 0 评论 0原文

上下文示例:

调用 * 来解压输入是否会将所有内容放入内存?我希望不是,只是想确认我的理解。

input = (x for x in ((1, 'abc'), (2, 'def'))) # generator expression
unzipped = zip(*input) # Does *input get completely unpacked or stay memory efficient?
first_items = next(unzipped)
print(first_items)
# >> (1, 2)

Example for context:

Does calling * to unpack input put everything into memory? I'm hoping not but just want to confirm my understanding.

input = (x for x in ((1, 'abc'), (2, 'def'))) # generator expression
unzipped = zip(*input) # Does *input get completely unpacked or stay memory efficient?
first_items = next(unzipped)
print(first_items)
# >> (1, 2)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

落日海湾 2025-01-19 07:26:50

解包会急切地解包相关迭代的顶层,因此在您的情况下,是的,它将在实际调用 zip 之前运行生成器表达式以完成,然后执行相当于 zip( (1, 'abc'), (2, 'def'))。如果生成器内部的迭代器本身就是惰性迭代器,那么 zip 根本不会预读取它们,这通常是更重要的节省。例如,如果 input 定义为:

input = (open(name) for name in ('file1', 'file2'))

then while:

unzipped = zip(*input)

does eagerly open 两个文件(因此您也可能使用了 listcomp; genexpr 并没有真正保存任何内容),它不会从它们中读取任何一行。然后,当您这样做时:

first_items = next(unzipped)

它会从每个文件中读取一行,但在您请求更多项目之前,它不会读取文件的其余部分(从技术上讲,在幕后,文件对象会阻止读取,因此它将读取超过只是它返回的行,但这是一个实现细节;它不会仅仅为了给您第一行而占用整个 10 GB 文件)。

这就是*解包的本质;接收函数需要在调用时填充其参数。如果你定义:

def foo(a, b):
    print(b)
    print(a)

如果调用者可以执行 foo(*iterator) ,迭代器在生成 a 的值时会引发异常,但你只当您执行 print(b) 时看到它(此时它必须将迭代器前进两次才能延迟填充 b)。没有人知道出了什么问题。实际上,每个函数都必须处理这样一个事实:简单地加载其参数(不对它们执行任何操作)可能会引发异常。不漂亮。

当处理惰性迭代器是合理的时候(它不适用于 zip ;无论如何,第一个输出将需要从所有参数中读取,所以最多你会延迟从构造时刻到您第一次从中提取值,除非您构建一个 zip 对象并将其丢弃(未使用),否则不会保存任何内容),只需直接接受迭代器即可。或者两者都做; itertools' chain 允许 eager:

for item in chain(iter1, iter2):

和懒惰:

for item in chain.from_iterable(iter_of_iters):

调用技术,正是因为它不想强迫拥有 iter_of_iters 的人在将第一个值链接到单个值之前实现内存中的所有迭代器(这是 for item in chain(*iter_of_iters): 所需要的)。

Unpacking eagerly unpacks the top level of the iterable in question, so in your case, yes, it will run the generator expression to completion before zip is actually invoked, then perform the equivalent of zip((1, 'abc'), (2, 'def')). If the iterables inside the generator were themselves lazy iterators though, zip won't preread them at all, which is usually the more important savings. For example, if input is defined with:

input = (open(name) for name in ('file1', 'file2'))

then while:

unzipped = zip(*input)

does eagerly open both files (so you may as well have used a listcomp; the genexpr didn't really save anything), it doesn't read a single line from either of them. When you then do:

first_items = next(unzipped)

it will read exactly one line from each, but it doesn't read the rest of the file until you ask for more items (technically, under the hood, file objects do block reads, so it will read more than just the line it returns, but that's an implementation detail; it won't slurp the whole of a 10 GB file just to give you the first line).

This is the nature of * unpacking; the receiving function needs to populate its arguments at the moment it is called. If you define:

def foo(a, b):
    print(b)
    print(a)

it would be very strange if a caller could do foo(*iterator), the iterator raises an exception when it produces the value for a, but you only see it when you do print(b) (at which point it has to advance the iterator twice to lazily populate b). No one would have the foggiest idea what went wrong. And literally every function would have to deal with the fact that simply loading its arguments (without doing anything with them) might raise an exception. Not pretty.

When it's reasonable to handle lazy iterators (it isn't for zip; the very first output would need to read from all the arguments anyway, so at best you'd delay the realization of the arguments from the moment of construction to the first time you extract a value from it, saving nothing unless you build a zip object and discard it unused), just accept the iterator directly. Or do both; itertools' chain allows both an eager:

for item in chain(iter1, iter2):

and a lazy:

for item in chain.from_iterable(iter_of_iters):

call techniqe, precisely because it didn't want to force people with an iter_of_iters to realize all of the iterators in memory before it chained a single value from the first one (which is what for item in chain(*iter_of_iters): would require).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文