如何为 Python 迭代器编写分页器?
我正在寻找一种“翻阅”Python 迭代器的方法。也就是说,我想用另一个迭代器包装给定的迭代器 iter 和 page_size ,该迭代器会将 iter 中的项目作为一系列“页面”返回。每个页面本身就是一个迭代器,最多可进行 page_size 次迭代。
我查看了 itertools ,我看到的最接近的是 itertools.islice。在某些方面,我想要的是 itertools.chain ——我不想将一系列迭代器链接到一个迭代器中,而是想将迭代器分解为一系列较小的迭代器。我本来希望在 itertools 中找到一个分页函数,但找不到。
我想出了以下寻呼机课程和演示。
class pager(object):
"""
takes the iterable iter and page_size to create an iterator that "pages through" iter. That is, pager returns a series of page iterators,
each returning up to page_size items from iter.
"""
def __init__(self,iter, page_size):
self.iter = iter
self.page_size = page_size
def __iter__(self):
return self
def next(self):
# if self.iter has not been exhausted, return the next slice
# I'm using a technique from
# https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
# to check for iterator completion by cloning self.iter into 3 copies:
# 1) self.iter gets advanced to the next page
# 2) peek is used to check on whether self.iter is done
# 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
try:
next_v = next(peek)
except StopIteration: # catch the exception and then raise it
raise StopIteration
else:
# consume the page from the iterator so that the next page is up in the next iteration
# is there a better way to do this?
#
for i in itertools.islice(self.iter,self.page_size): pass
return itertools.islice(iter_for_return,self.page_size)
iterator_size = 10
page_size = 3
my_pager = pager(xrange(iterator_size),page_size)
# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()
for page in my_pager:
for i in page:
print i
print "----"
print "skipped first page: " , list(page1)
我正在寻找一些反馈,并有以下问题:
- itertools 中是否已有一个寻呼机可以为我忽略的寻呼机提供服务?
- 克隆 self.iter 3 次对我来说似乎很笨拙。一种克隆是检查 self.iter 是否还有更多项目。我决定选择 Alex Martelli 建议的技术(注意他写了一个包装技术)。第二个克隆是为了使返回的页面独立于内部迭代器(self.iter)。有没有办法避免产生 3 个克隆?
- 除了捕获并再次引发异常之外,还有更好的方法来处理 StopIteration 异常吗?我很想完全不去抓住它,让它冒泡。
谢谢! -雷蒙德
I'm looking for a way to "page through" a Python iterator. That is, I would like to wrap a given iterator iter and page_size with another iterator that would would return the items from iter as a series of "pages". Each page would itself be an iterator with up to page_size iterations.
I looked through itertools and the closest thing I saw is itertools.islice. In some ways, what I'd like is the opposite of itertools.chain -- instead of chaining a series of iterators together into one iterator, I'd like to break an iterator up into a series of smaller iterators. I was expecting to find a paging function in itertools but couldn't locate one.
I came up with the following pager class and demonstration.
class pager(object):
"""
takes the iterable iter and page_size to create an iterator that "pages through" iter. That is, pager returns a series of page iterators,
each returning up to page_size items from iter.
"""
def __init__(self,iter, page_size):
self.iter = iter
self.page_size = page_size
def __iter__(self):
return self
def next(self):
# if self.iter has not been exhausted, return the next slice
# I'm using a technique from
# https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
# to check for iterator completion by cloning self.iter into 3 copies:
# 1) self.iter gets advanced to the next page
# 2) peek is used to check on whether self.iter is done
# 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
try:
next_v = next(peek)
except StopIteration: # catch the exception and then raise it
raise StopIteration
else:
# consume the page from the iterator so that the next page is up in the next iteration
# is there a better way to do this?
#
for i in itertools.islice(self.iter,self.page_size): pass
return itertools.islice(iter_for_return,self.page_size)
iterator_size = 10
page_size = 3
my_pager = pager(xrange(iterator_size),page_size)
# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()
for page in my_pager:
for i in page:
print i
print "----"
print "skipped first page: " , list(page1)
I'm looking for some feedback and have the following questions:
- Is there a pager already in itertools that serves a pager that I'm overlooking?
- Cloning self.iter 3 times seems kludgy to me. One clone is to check whether self.iter has any more items. I decided to go with a technique Alex Martelli suggested (aware that he wrote of a wrapping technique). The second clone was to enable the returned page to be independent of the internal iterator (self.iter). Is there a way to avoid making 3 clones?
- Is there a better way to deal with the StopIteration exception beside catching it and then raising it again? I am tempted to not catch it at all and let it bubble up.
Thanks!
-Raymond
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
查看
grouper()
,来自 <代码>itertools食谱。Look at
grouper()
, from theitertools
recipes.你为什么不使用这个?
“每个页面本身就是一个迭代器,最多包含 page_size”项。每个页面都是一个简单的项目列表,它是可迭代的。您可以使用
yield iter(page)
来生成迭代器而不是对象,但我不知道这如何改进任何东西。它在最后抛出一个标准的
StopIteration
。你还想要什么?
Why aren't you using this?
"Each page would itself be an iterator with up to page_size" items. Each page is a simple list of items, which is iterable. You could use
yield iter(page)
to yield the iterator instead of the object, but I don't see how that improves anything.It throws a standard
StopIteration
at the end.What more would you want?
我会这样做:
这样,
None
就可以是迭代器吐出的合法值。 仅过滤掉单个对象fillvalue
,并且它不可能是可迭代的元素。I'd do it like this:
That way,
None
can be a legitimate value that the iterator spits out. Only the single objectfillvalue
filtered out, and it cannot possibly be an element of the iterable.基于指向 grouper() 的 itertools 配方的指针,我想出了以下对 grouper() 的修改来模仿 Pager。我想过滤掉任何 None 结果,并想返回一个迭代器而不是一个元组(尽管我怀疑进行此转换可能没有什么优势),
我欢迎就如何改进此代码提供反馈。
Based on the pointer to the itertools recipe for grouper(), I came up with the following adaption of grouper() to mimic Pager. I wanted to filter out any None results and wanted to return an iterator rather than a tuple (though I suspect that there might be little advantage in doing this conversion)
I'd welcome feedback on how what I can do to improve this code.
more_itertools.chunked 将会完全按照您的方式进行寻找:
如果您希望在不创建临时列表的情况下进行分块,则可以使用
more_itertools.ichunked
。该库还有许多其他不错的选项,可用于有效分组、窗口、切片等。
more_itertools.chunked will do exactly what you're looking for:
If you want the chunking without creating temporary lists, you can use
more_itertools.ichunked
.That library also has lots of other nice options for efficiently grouping, windowing, slicing, etc.