itertools 中的 izip_longest:这是怎么回事?
我很难理解下面的代码是如何工作的。它来自 http://docs.python.org/library/itertools.html#itertools .izip_longest,是 izip_longest 迭代器的纯 Python 等效项。我对哨兵功能尤其感到困惑,它是如何工作的?
def izip_longest(*args, **kwds):
# izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
yield counter() # yields the fillvalue, or raises IndexError
fillers = repeat(fillvalue)
iters = [chain(it, sentinel(), fillers) for it in args]
try:
for tup in izip(*iters):
yield tup
except IndexError:
pass
I'm struggeling to understand how the below code works. It's from http://docs.python.org/library/itertools.html#itertools.izip_longest, and is the pure-python equivalent of the izip_longest iterator. I'm especially mystified by the sentinel function, how does it work?
def izip_longest(*args, **kwds):
# izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
yield counter() # yields the fillvalue, or raises IndexError
fillers = repeat(fillvalue)
iters = [chain(it, sentinel(), fillers) for it in args]
try:
for tup in izip(*iters):
yield tup
except IndexError:
pass
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好的,我们可以做到这一点。关于哨兵。表达式
([fillvalue]*(len(args)-1))
创建一个列表,其中包含args
中每个可迭代对象的一个填充值(减一)。因此,对于上面的示例['-']
。然后为counter
分配pop
-该列表的功能。sentinel
本身是一个生成器,弹出一个项目从每次迭代的列表中。您可以仅对sentinel
返回的每个迭代器进行一次迭代,并且它始终会产生fillvalue
。由sentinel
返回的所有迭代器产生的项目总数为len(args) - 1
(感谢 Sven Marnach 的澄清,我误解了它)。现在看看这个:
这就是窍门。
iters
是一个列表,其中包含args
中每个可迭代对象的迭代器。这些迭代器中的每一个都执行以下操作:args
中相应可迭代对象中的所有项目。fillvalue
。fillvalue
。现在,正如之前所建立的,我们只能在抛出
IndexError
之前将所有哨兵一起迭代len(args)-1
次。这很好,因为其中一个可迭代对象是最长的。因此,当我们遇到IndexError
引发的情况时,这意味着我们已经完成了对args
中最长可迭代对象的迭代。不客气。
PS:我希望这是可以理解的。
Ok, we can do this. About the sentinel. The expression
([fillvalue]*(len(args)-1))
creates a list that contains one fill value for each iterable inargs
, minus one. So, for the example above['-']
.counter
is then assigned thepop
-function of that list.sentinel
itself is a generator that pops one item from that list on each iteration. You can iterate over each iterator returned bysentinel
exactly once, and it will always yieldfillvalue
. The total number of items yielded by all iterators returned bysentinel
islen(args) - 1
(thanks to Sven Marnach for clarifying that, I misunderstood it).Now check out this:
That is the trick.
iters
is a list that contains an iterator for each iterable inargs
. Each of these iterators does the following:args
.fillvalue
.fillvalue
for all eternity.Now, as established earlier, we can only iterate over all sentinels together
len(args)-1
times before it throws anIndexError
. This is fine, because one of the iterables is the longest. So, when we come to the point that theIndexError
is raised, that means we have finished iterating over the longest iterable inargs
.You are welcome.
P.S.: I hope this is understandable.
函数
sentinel()
返回迭代器,仅产生一次fillvalue
。sentinel()
返回的所有迭代器生成的fillvalue
总数限制为n-1
,其中n
code> 是传递给 izip_longest() 的迭代器数量。在耗尽此数量的 fillvalue 后,对由sentinel()
返回的迭代器进行进一步迭代将引发IndexError
。该函数用于检测是否所有迭代器都已耗尽:每个迭代器都使用
sentinel()
返回的迭代器进行 chain() 操作。如果所有迭代器都已耗尽,sentinel()
返回的迭代器将被迭代n
次,导致IndexError
,触发依次结束izip_longest()
。到目前为止,我解释了
sentinel()
的作用,而不是它的工作原理。当调用izip_longest()
时,将评估sentinel()
的定义。在评估定义时,每次调用izip_longest()
时,也会评估sentinel()
的默认参数。该代码相当于将其存储在默认参数中而不是存储在封闭范围内的变量中只是一种优化,就像在默认参数中包含
.pop
一样,因为它可以节省查找时间每次迭代sentinel()
返回的迭代器时。The function
sentinel()
returns iterators yieldingfillvalue
exactly once. The total number offillvalue
s yielded by all iterators returned bysentinel()
is limited ton-1
, wheren
is the number of iterators passed toizip_longest()
. After this number offillvalue
s has been exhausted, further iteration over an iterator returned bysentinel()
will raise anIndexError
.This function is used to detect whether all iterators have been exhausted: Each iterator is
chain()
ed with an iterator returned bysentinel()
. If all iterators are exhausted, an iterator returned bysentinel()
will get iterated over for then
th time, resulting in anIndexError
, triggering the end ofizip_longest()
in turn.So far I explained what
sentinel()
does, not how it works. Whenizip_longest()
is called, the definition ofsentinel()
is evaluated. While evaluating the definition, also the default argument tosentinel()
is evaluated, once per call toizip_longest()
. The code is equivalent toStoring this in a default argument instead of in a variable in an enclosing scope is just an optimisation, as is the inclusion of
.pop
in the default argument, since it save looking it up every time an iterator returned bysentinel()
is iterated over.sentinel
的定义几乎与 相同,只是它获取
pop
绑定方法(函数对象)作为默认参数。默认参数在函数定义时计算,因此每次调用 izip_longest 一次,而不是每次调用 Sentinel 一次。因此,函数对象“记住”列表[fillvalue] * (len(args) - 1)
,而不是在每次调用中重新构造它。The definition of
sentinel
is almost equivalent toexcept that it gets the
pop
bound method (a function object) as a default argument. A default argument is evaluated at the time of function definition, so once per call toizip_longest
instead of once per call tosentinel
. Therefore, the function object "remembers" the list[fillvalue] * (len(args) - 1)
instead of constructing this anew in every call.