如何为 Python 迭代器编写分页器？

发布于 2024-08-22 19:55:06 字数 3026 浏览 9 评论 0原文

我正在寻找一种“翻阅”Python 迭代器的方法。也就是说，我想用另一个迭代器包装给定的迭代器 iter 和 page_size ，该迭代器会将 iter 中的项目作为一系列“页面”返回。每个页面本身就是一个迭代器，最多可进行 page_size 次迭代。

我查看了 itertools ，我看到的最接近的是 itertools.islice。在某些方面，我想要的是 itertools.chain ——我不想将一系列迭代器链接到一个迭代器中，而是想将迭代器分解为一系列较小的迭代器。我本来希望在 itertools 中找到一个分页函数，但找不到。

我想出了以下寻呼机课程和演示。

class pager(object):
    """
    takes the iterable iter and page_size to create an iterator that "pages through" iter.  That is, pager returns a series of page iterators,
    each returning up to page_size items from iter.
    """
    def __init__(self,iter, page_size):
        self.iter = iter
        self.page_size = page_size
    def __iter__(self):
        return self
    def next(self):
        # if self.iter has not been exhausted, return the next slice
        # I'm using a technique from 
        # https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
        # to check for iterator completion by cloning self.iter into 3 copies:
        # 1) self.iter gets advanced to the next page
        # 2) peek is used to check on whether self.iter is done
        # 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
        self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
        try:
            next_v = next(peek)
        except StopIteration: # catch the exception and then raise it
            raise StopIteration
        else:
            # consume the page from the iterator so that the next page is up in the next iteration
            # is there a better way to do this?
            # 
            for i in itertools.islice(self.iter,self.page_size): pass
            return itertools.islice(iter_for_return,self.page_size)



iterator_size = 10
page_size = 3

my_pager = pager(xrange(iterator_size),page_size)

# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()

for page in my_pager:
    for i in page:
        print i
    print "----"

print "skipped first page: " , list(page1)

我正在寻找一些反馈，并有以下问题：

itertools 中是否已有一个寻呼机可以为我忽略的寻呼机提供服务？
克隆 self.iter 3 次对我来说似乎很笨拙。一种克隆是检查 self.iter 是否还有更多项目。我决定选择 Alex Martelli 建议的技术（注意他写了一个包装技术）。第二个克隆是为了使返回的页面独立于内部迭代器（self.iter）。有没有办法避免产生 3 个克隆？
除了捕获并再次引发异常之外，还有更好的方法来处理 StopIteration 异常吗？我很想完全不去抓住它，让它冒泡。

谢谢！ -雷蒙德

原文

I'm looking for a way to "page through" a Python iterator. That is, I would like to wrap a given iterator iter and page_size with another iterator that would would return the items from iter as a series of "pages". Each page would itself be an iterator with up to page_size iterations.

I looked through itertools and the closest thing I saw is itertools.islice. In some ways, what I'd like is the opposite of itertools.chain -- instead of chaining a series of iterators together into one iterator, I'd like to break an iterator up into a series of smaller iterators. I was expecting to find a paging function in itertools but couldn't locate one.

I came up with the following pager class and demonstration.

class pager(object):
    """
    takes the iterable iter and page_size to create an iterator that "pages through" iter.  That is, pager returns a series of page iterators,
    each returning up to page_size items from iter.
    """
    def __init__(self,iter, page_size):
        self.iter = iter
        self.page_size = page_size
    def __iter__(self):
        return self
    def next(self):
        # if self.iter has not been exhausted, return the next slice
        # I'm using a technique from 
        # https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
        # to check for iterator completion by cloning self.iter into 3 copies:
        # 1) self.iter gets advanced to the next page
        # 2) peek is used to check on whether self.iter is done
        # 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
        self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
        try:
            next_v = next(peek)
        except StopIteration: # catch the exception and then raise it
            raise StopIteration
        else:
            # consume the page from the iterator so that the next page is up in the next iteration
            # is there a better way to do this?
            # 
            for i in itertools.islice(self.iter,self.page_size): pass
            return itertools.islice(iter_for_return,self.page_size)



iterator_size = 10
page_size = 3

my_pager = pager(xrange(iterator_size),page_size)

# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()

for page in my_pager:
    for i in page:
        print i
    print "----"

print "skipped first page: " , list(page1)

I'm looking for some feedback and have the following questions:

Is there a pager already in itertools that serves a pager that I'm overlooking?
Cloning self.iter 3 times seems kludgy to me. One clone is to check whether self.iter has any more items. I decided to go with a technique Alex Martelli suggested (aware that he wrote of a wrapping technique). The second clone was to enable the returned page to be independent of the internal iterator (self.iter). Is there a way to avoid making 3 clones?
Is there a better way to deal with the StopIteration exception beside catching it and then raising it again? I am tempted to not catch it at all and let it bubble up.

Thanks!
-Raymond

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别忘他 2024-08-29 19:55:06

查看 grouper()，来自 <代码>itertools食谱。

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Look at grouper(), from the itertools recipes.

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

回复收藏 0 原文

如此安好 2024-08-29 19:55:06

你为什么不使用这个？

def grouper( page_size, iterable ):
    page= []
    for item in iterable:
        page.append( item )
        if len(page) == page_size:
            yield page
            page= []
    yield page

“每个页面本身就是一个迭代器，最多包含 page_size”项。每个页面都是一个简单的项目列表，它是可迭代的。您可以使用 yield iter(page) 来生成迭代器而不是对象，但我不知道这如何改进任何东西。

它在最后抛出一个标准的 StopIteration 。

你还想要什么？

Why aren't you using this?

def grouper( page_size, iterable ):
    page= []
    for item in iterable:
        page.append( item )
        if len(page) == page_size:
            yield page
            page= []
    yield page

"Each page would itself be an iterator with up to page_size" items. Each page is a simple list of items, which is iterable. You could use yield iter(page) to yield the iterator instead of the object, but I don't see how that improves anything.

It throws a standard StopIteration at the end.

What more would you want?

回复收藏 0 原文

倦话 2024-08-29 19:55:06

我会这样做：

def pager(iterable, page_size):
    args = [iter(iterable)] * page_size
    fillvalue = object()
    for group in izip_longest(fillvalue=fillvalue, *args):
        yield (elem for elem in group if elem is not fillvalue)

这样，None 就可以是迭代器吐出的合法值。仅过滤掉单个对象fillvalue，并且它不可能是可迭代的元素。

I'd do it like this:

def pager(iterable, page_size):
    args = [iter(iterable)] * page_size
    fillvalue = object()
    for group in izip_longest(fillvalue=fillvalue, *args):
        yield (elem for elem in group if elem is not fillvalue)

That way, None can be a legitimate value that the iterator spits out. Only the single object fillvalue filtered out, and it cannot possibly be an element of the iterable.

回复收藏 0 原文

醉生梦死 2024-08-29 19:55:06

基于指向 grouper() 的 itertools 配方的指针，我想出了以下对 grouper() 的修改来模仿 Pager。我想过滤掉任何 None 结果，并想返回一个迭代器而不是一个元组（尽管我怀疑进行此转换可能没有什么优势），

# based on http://docs.python.org/library/itertools.html#recipes
def grouper2(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    for item in izip_longest(fillvalue=fillvalue, *args):
        yield iter(filter(None,item))

我欢迎就如何改进此代码提供反馈。

Based on the pointer to the itertools recipe for grouper(), I came up with the following adaption of grouper() to mimic Pager. I wanted to filter out any None results and wanted to return an iterator rather than a tuple (though I suspect that there might be little advantage in doing this conversion)

# based on http://docs.python.org/library/itertools.html#recipes
def grouper2(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    for item in izip_longest(fillvalue=fillvalue, *args):
        yield iter(filter(None,item))

I'd welcome feedback on how what I can do to improve this code.

回复收藏 0 原文

等待我真够勒 2024-08-29 19:55:06

def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist

def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist

回复收藏 0 原文

森罗 2024-08-29 19:55:06

more_itertools.chunked 将会完全按照您的方式进行寻找：

>>> import more_itertools
>>> list(chunked([1, 2, 3, 4, 5, 6], 3))
[[1, 2, 3], [4, 5, 6]]

如果您希望在不创建临时列表的情况下进行分块，则可以使用 more_itertools.ichunked。

该库还有许多其他不错的选项，可用于有效分组、窗口、切片等。

more_itertools.chunked will do exactly what you're looking for:

>>> import more_itertools
>>> list(chunked([1, 2, 3, 4, 5, 6], 3))
[[1, 2, 3], [4, 5, 6]]

If you want the chunking without creating temporary lists, you can use more_itertools.ichunked.

That library also has lots of other nice options for efficiently grouping, windowing, slicing, etc.

回复收藏 0 原文

~没有更多了~

关于作者

难如初

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何为 Python 迭代器编写分页器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

如何为 Python 迭代器编写分页器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。