Python 生成器将另一个可迭代分为 N 组

发布于 2024-09-28 13:05:45 字数 346 浏览 1 评论 0原文

我正在寻找一个函数,它接受可迭代的 i 和大小 n 并生成长度为 n 的元组,这些元组是来自 的顺序值>i:

x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]

给出

[(1,2,3),(4,5,6),(7,8,9),(0)]

标准库中是否存在这样的函数?

如果它作为标准库的一部分存在,我似乎无法找到它,并且我已经没有要搜索的术语了。我可以自己写,但我不愿意。

I'm looking for a function that takes an iterable i and a size n and yields tuples of length n that are sequential values from i:

x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]

gives

[(1,2,3),(4,5,6),(7,8,9),(0)]

Does such a function exist in the standard library?

If it exists as part of the standard library, I can't seem to find it and I've run out of terms to search for. I could write my own, but I'd rather not.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

听,心雨的声音 2024-10-05 13:05:45

当您想要将迭代器分成 n 块进行分组时,不使用填充值填充最终组,请使用 iter(lambda: list(IT. islice(iterable, n)), []):

import itertools as IT

def grouper(n, iterable):
    """
    >>> list(grouper(3, 'ABCDEFG'))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    iterable = iter(iterable)
    return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

yields

[[1, 2, 3], [4, 5, 6], [7]]

这个答案


当您想要将迭代器分成 n并用填充值填充最终组时,请使用 石斑鱼食谱 zip_longest(*[iterator]*n)

例如,在Python2:

>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

在 Python3 中,izip_longest 现已重命名为 zip_longest

>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

当您想要将序列分组为 块时n你可以使用chunks配方

def chunks(seq, n):
    # https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

请注意,与一般的迭代器不同,根据定义的序列具有长度(即定义了__len__)。

When you want to group an iterator in chunks of n without padding the final group with a fill value, use iter(lambda: list(IT.islice(iterable, n)), []):

import itertools as IT

def grouper(n, iterable):
    """
    >>> list(grouper(3, 'ABCDEFG'))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    iterable = iter(iterable)
    return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

yields

[[1, 2, 3], [4, 5, 6], [7]]

There is an explanation of how it works in the second half of this answer.


When you want to group an iterator in chunks of n and pad the final group with a fill value, use the grouper recipe zip_longest(*[iterator]*n):

For example, in Python2:

>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

In Python3, what was izip_longest is now renamed zip_longest:

>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

When you want to group a sequence in chunks of n you can use the chunks recipe:

def chunks(seq, n):
    # https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

Note that, unlike iterators in general, sequences by definition have a length (i.e. __len__ is defined).

小嗷兮 2024-10-05 13:05:45

请参阅 itertoolsgrouper 配方> package

def grouper(n, iterable, fillvalue=None):
  "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

(但是,这是相当多的问题的重复。)

See the grouper recipe in the docs for the itertools package

def grouper(n, iterable, fillvalue=None):
  "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

(However, this is a duplicate of quite a few questions.)

猫性小仙女 2024-10-05 13:05:45

这个怎么样?但它没有填充值。

>>> def partition(itr, n):
...     i = iter(itr)
...     res = None
...     while True:
...             res = list(itertools.islice(i, 0, n))
...             if res == []:
...                     break
...             yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>

它使用原始可迭代对象的副本,并为每个连续的拼接耗尽该副本。我疲惫的大脑能想到的唯一其他方法是生成具有范围的拼接端点。

也许我应该将 list() 更改为 tuple() ,以便它更好地对应您的输出。

How about this one? It doesn't have a fill value though.

>>> def partition(itr, n):
...     i = iter(itr)
...     res = None
...     while True:
...             res = list(itertools.islice(i, 0, n))
...             if res == []:
...                     break
...             yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>

It utilizes a copy of the original iterable, which it exhausts for each successive splice. The only other way my tired brain could come up with was generating splice end-points with range.

Maybe I should change list() to tuple() so it better corresponds to your output.

雪化雨蝶 2024-10-05 13:05:45

这是 Python 中非常常见的请求。它非常常见,以至于它被纳入 boltons 统一实用程序包中。首先,这里有大量文档。此外,模块的设计和测试仅依赖于标准库(Python 2 和 3 兼容),这意味着您可以 只需下载该文件直接进入您的项目

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked 

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

还有用于不定/长序列的迭代器/生成器形式:

print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

如您所见,您也可以用您选择的值填充序列。最后,作为维护者,我可以向您保证,虽然代码已被数千名开发人员下载/测试,但如果您遇到任何问题,您将通过 boltons GitHub 问题页面。希望这个(和/或任何其他 150 多个 Bolton 食谱)有所帮助!

This is a very common request in Python. Common enough that it made it into the boltons unified utility package. First off, there are extensive docs here. Furthermore, the module is designed and tested to only rely on the standard library (Python 2 and 3 compatible), meaning you can just download the file directly into your project.

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked 

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

There's an iterator/generator form for indefinite/long sequences as well:

print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

As you can see, you can also fill the sequence with a value of your choosing, as well. Finally, as the maintainer, I can assure you that, while the code has been downloaded/tested by thousands of developers, if you encounter any issues, you'll get the fastest support possible through the boltons GitHub Issues page. Hope this (and/or any of the other 150+ boltons recipes) helped!

漫雪独思 2024-10-05 13:05:45

我使用 more_itertools 包中的分块函数

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]

I use the chunked function from the more_itertools package.

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]
蓝海似她心 2024-10-05 13:05:45

这是一个非常古老的问题,但我认为针对一般情况提及以下方法是有用的。它的主要优点是它只需要迭代数据一次,因此它可以与数据库游标或其他只能使用一次的序列一起使用。我还发现它更具可读性。

def chunks(n, iterator):
    out = []
    for elem in iterator:
        out.append(elem)
        if len(out) == n:
            yield out
            out = []
    if out:
        yield out

This is a very old quesiton, but I think it is useful to mention the following approach for the general case. Its main merit is that it only needs to iterate over the data once, so it will work with database cursors or other sequences that can only be used once. I also find it more readable.

def chunks(n, iterator):
    out = []
    for elem in iterator:
        out.append(elem)
        if len(out) == n:
            yield out
            out = []
    if out:
        yield out
晨曦÷微暖 2024-10-05 13:05:45

我知道这个问题已经被回答过好几次了,但我添加了我的解决方案,与石斑鱼配方相比,它应该在序列和迭代器的一般适用性、可读性(没有 StopIteration 异常导致的不可见循环退出条件)和性能方面得到改进。它与 Svein 的最后一个答案最相似。

def chunkify(iterable, n):
    iterable = iter(iterable)
    n_rest = n - 1

    for item in iterable:
        rest = itertools.islice(iterable, n_rest)
        yield itertools.chain((item,), rest)

I know this has been answered several times but I'm adding my solution which should improve in both, general applicability to sequences and iterators, readability (no invisible loop exit condition by StopIteration exception) and performance when compared to the grouper recipe. It is most similar to the last answer by Svein.

def chunkify(iterable, n):
    iterable = iter(iterable)
    n_rest = n - 1

    for item in iterable:
        rest = itertools.islice(iterable, n_rest)
        yield itertools.chain((item,), rest)
一花一树开 2024-10-05 13:05:45

这是一个不同的解决方案,它不使用 itertools,尽管它多了几行,但当块比可迭代长度短得多时,它显然优于给定的答案。
然而,对于大块,其他答案要快得多。

def batchiter(iterable, batch_size):
    """
    >>> list(batchiter('ABCDEFG', 3))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    next_batch = []
    for element in iterable:
        next_batch.append(element)
        if len(next_batch) == batch_size:
            batch, next_batch = next_batch, []
            yield batch
    if next_batch:
        yield next_batch


In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop

Here is a different solution which makes no use of itertools and, even though it has a couple more lines, it apparently outperforms the given answers when chunks are a lot shorter than the iterable lenght.
However, for big chunks the other answers are much faster.

def batchiter(iterable, batch_size):
    """
    >>> list(batchiter('ABCDEFG', 3))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    next_batch = []
    for element in iterable:
        next_batch.append(element)
        if len(next_batch) == batch_size:
            batch, next_batch = next_batch, []
            yield batch
    if next_batch:
        yield next_batch


In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop
嗼ふ静 2024-10-05 13:05:45
    def grouper(iterable, n):
        while True:
            yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))
    def grouper(iterable, n):
        while True:
            yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文