如何将列表拆分为大小相等的块?

发布于 2024-07-22 05:26:05 字数 214 浏览 7 评论 0 原文

如何将任意长度的列表拆分为大小相等的块?


另请参阅:如何以块的形式迭代列表
要对字符串进行分块,请参阅每隔第 n 个字符拆分字符串?

How do I split a list of arbitrary length into equal sized chunks?


See also: How to iterate over a list in chunks.
To chunk strings, see Split string every nth character?.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(30

岛徒 2024-07-29 05:26:05

这是一个生成大小均匀的块的生成器:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

对于 Python 2,使用 xrange 而不是 range

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

下面是一个列表理解单行代码。 不过,上面的方法更可取,因为使用命名函数使代码更容易理解。 对于 Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

对于 Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

Here's a generator that yields evenly-sized chunks:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

For Python 2, using xrange instead of range:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

For Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]
嗫嚅 2024-07-29 05:26:05

超级简单:

def chunks(xs, n):
    n = max(1, n)
    return (xs[i:i+n] for i in range(0, len(xs), n))

对于 Python 2,使用 xrange() 而不是 range()

Something super simple:

def chunks(xs, n):
    n = max(1, n)
    return (xs[i:i+n] for i in range(0, len(xs), n))

For Python 2, use xrange() instead of range().

风向决定发型 2024-07-29 05:26:05

我知道这有点旧,但还没有人提到 numpy .array_split

import numpy as np

lst = range(50)
np.array_split(lst, 5)

结果:

[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)

Result:

[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
如若梦似彩虹 2024-07-29 05:26:05

直接来自(旧)Python 文档(itertools 的食谱):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

当前版本,如 JFSebastian 所建议的:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

我猜 Guido 的时间机器可以工作——工作了——将工作——将工作——再次工作。

这些解决方案之所以有效,是因为 [iter(iterable)]*n (或早期版本中的等效项)创建一个迭代器,并在列表。 然后,izip_longest 有效地执行“每个”迭代器的循环; 因为这是同一个迭代器,所以每次这样的调用都会使其前进,从而导致每个这样的 zip-roundrobin 生成一个由 n 项组成的元组。

Python ≥3.12

itertools.batched 可用。

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

Python ≥3.12

itertools.batched is available.

静谧幽蓝 2024-07-29 05:26:05

我很惊讶没有人想到使用 iter双参数形式

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

这适用于任何可迭代对象并延迟生成输出。 它返回元组而不是迭代器,但我认为它仍然具有一定的优雅性。 它也不垫; 如果您想要填充,上面的简单变体就足够了:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

演示:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

与基于izip_longest的解决方案一样,上面总是填充。 据我所知,对于可选填充的函数,没有一行或两行的 itertools 配方。 通过结合上述两种方法,这个方法非常接近:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

我相信这是提供可选填充的最短分块器。

正如 Tomasz Gandor 观察到的,如果两个填充分块器遇到一长串填充值,它们将意外停止。 这是以合理的方式解决该问题的最终变体:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

演示:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
我不会写诗 2024-07-29 05:26:05

不要重新发明轮子。

更新:在 Python 3.12+ 中找到了完整的解决方案 itertools.batched

给定

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

代码

itertools .batched++

list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

详细信息

在 Python 3.12 之前建议使用以下非本机方法:

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

(或DIY,如果你愿意的话)

标准库

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)
    

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)
    

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

参考

+ 实现 itertools 食谱 等等。 <代码>> pip install more_itertools

++包含在 Python 标准库 3.12+ 中。 batched 类似于 more_itertools.chunked

Don't reinvent the wheel.

UPDATE: A complete solution is found in Python 3.12+ itertools.batched.

Given

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

Code

itertools.batched++

list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

Details

The following non-native approaches were suggested prior to Python 3.12:

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

(or DIY, if you want)

The Standard Library

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)
    

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)
    

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

References

+ A third-party library that implements itertools recipes and more. > pip install more_itertools

++Included in Python Standard Library 3.12+. batched is similar to more_itertools.chunked.

诗酒趁年少 2024-07-29 05:26:05

这是一个适用于任意迭代的生成器:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

示例:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]
回忆躺在深渊里 2024-07-29 05:26:05

简单而优雅

L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]

,或者如果您愿意:

def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)

Simple yet elegant

L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]

or if you prefer:

def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)
后知后觉 2024-07-29 05:26:05

如何将列表分成大小均匀的块?

对我来说,“大小均匀的块”意味着它们的长度相同,或者除非该选项,否则长度的差异最小。 例如,21 件物品的 5 个篮子可能会产生以下结果:

>>> import statistics
>>> statistics.variance([5,5,5,5,1]) 
3.2
>>> statistics.variance([5,4,4,4,4]) 
0.19999999999999998

更喜欢后一种结果的实际原因:如果您使用这些功能来分配工作,您已经内置了其中一个可能比其他项目完成得更好的前景,因此它当其他人继续努力工作时,他会无所事事地坐着。

对这里其他答案的批评

当我最初写这个答案时,其他答案都不是大小均匀的块 - 它们都在最后留下了一个矮块,所以它们没有很好地平衡,并且长度的方差高于必要的长度。

例如,当前的最佳答案以:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

其他,例如 list(grouper(3, range(7)))chunk(range(7), 3)两者都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]None 只是填充,在我看来相当不优雅。 他们没有均匀地对可迭代对象进行分块。

为什么我们不能更好地划分这些?

循环解决方案

使用 itertools.cycle 的高级平衡解决方案,这就是我今天可能采用的方式。 设置如下:

from itertools import cycle
items = range(10, 75)
number_of_baskets = 10

现在我们需要在其中填充元素的列表:

baskets = [[] for _ in range(number_of_baskets)]

最后,我们将要分配的元素与篮子的循环一起压缩,直到用完元素,从语义上讲,这正是我们所要的想要:

for element, basket in zip(items, cycle(baskets)):
    basket.append(element)

结果如下:

>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
 [11, 21, 31, 41, 51, 61, 71],
 [12, 22, 32, 42, 52, 62, 72],
 [13, 23, 33, 43, 53, 63, 73],
 [14, 24, 34, 44, 54, 64, 74],
 [15, 25, 35, 45, 55, 65],
 [16, 26, 36, 46, 56, 66],
 [17, 27, 37, 47, 57, 67],
 [18, 28, 38, 48, 58, 68],
 [19, 29, 39, 49, 59, 69]]

为了生产该解决方案,我们编写一个函数,并提供类型注释:

from itertools import cycle
from typing import List, Any

def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    baskets = [[] for _ in range(min(maxbaskets, len(items)))]
    for item, basket in zip(items, cycle(baskets)):
        basket.append(item)
    return baskets

在上面,我们获取项目列表和篮子的最大数量。 我们创建一个空列表列表,以循环方式在其中附加每个元素。

切片

另一个优雅的解决方案是使用切片 - 特别是不太常用的切片step参数。 即:

start = 0
stop = None
step = number_of_baskets

first_basket = items[start:stop:step]

这是特别优雅的,因为切片不关心数据有多长 - 结果,我们的第一个篮子,只有它需要的长度。 我们只需要增加每个篮子的起点。

事实上,这可能是单行代码,但为了可读性并避免代码行过长,我们将采用多行代码:

from typing import List, Any

def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    n_baskets = min(maxbaskets, len(items))
    return [items[i::n_baskets] for i in range(n_baskets)]

并且来自 itertools 模块的 islice 将提供一种惰性迭代方法,就像最初是在问题中提出的。

我不认为大多数用例会受益匪浅,因为原始数据已经完全具体化在列表中,但对于大型数据集,它可以节省近一半的内存使用量。

from itertools import islice
from typing import List, Any, Generator
    
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
    n_baskets = min(maxbaskets, len(items))
    for i in range(n_baskets):
        yield islice(items, i, None, n_baskets)

查看结果:

from pprint import pprint

items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])

更新了先前的解决方案

这是另一个平衡的解决方案,改编自我过去在生产中使用的函数,它使用模运算符:

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in range(maxbaskets)]
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

并且我创建了一个生成器,如果将其放入列表中,它会执行相同的操作:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in range(baskets):
        yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
    

并且最后,因为我看到上述所有函数都以连续顺序返回元素(如给定的那样):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in range(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in range(length)]

输出

测试它们:

print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))

打印出:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

请注意,连续生成器提供与其他两个相同长度模式的块,但这些项目都是按顺序排列的,并且它们的划分就像划分离散元素列表一样均匀。

How do you split a list into evenly sized chunks?

"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:

>>> import statistics
>>> statistics.variance([5,5,5,5,1]) 
3.2
>>> statistics.variance([5,4,4,4,4]) 
0.19999999999999998

A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

Critique of other answers here

When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can't we divide these better?

Cycle Solution

A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:

from itertools import cycle
items = range(10, 75)
number_of_baskets = 10

Now we need our lists into which to populate the elements:

baskets = [[] for _ in range(number_of_baskets)]

Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:

for element, basket in zip(items, cycle(baskets)):
    basket.append(element)

Here's the result:

>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
 [11, 21, 31, 41, 51, 61, 71],
 [12, 22, 32, 42, 52, 62, 72],
 [13, 23, 33, 43, 53, 63, 73],
 [14, 24, 34, 44, 54, 64, 74],
 [15, 25, 35, 45, 55, 65],
 [16, 26, 36, 46, 56, 66],
 [17, 27, 37, 47, 57, 67],
 [18, 28, 38, 48, 58, 68],
 [19, 29, 39, 49, 59, 69]]

To productionize this solution, we write a function, and provide the type annotations:

from itertools import cycle
from typing import List, Any

def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    baskets = [[] for _ in range(min(maxbaskets, len(items)))]
    for item, basket in zip(items, cycle(baskets)):
        basket.append(item)
    return baskets

In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.

Slices

Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:

start = 0
stop = None
step = number_of_baskets

first_basket = items[start:stop:step]

This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.

In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:

from typing import List, Any

def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    n_baskets = min(maxbaskets, len(items))
    return [items[i::n_baskets] for i in range(n_baskets)]

And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.

I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.

from itertools import islice
from typing import List, Any, Generator
    
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
    n_baskets = min(maxbaskets, len(items))
    for i in range(n_baskets):
        yield islice(items, i, None, n_baskets)

View results with:

from pprint import pprint

items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])

Updated prior solutions

Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in range(maxbaskets)]
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in range(baskets):
        yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
    

And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in range(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in range(length)]

Output

To test them out:

print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))

Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.

〆一缕阳光ご 2024-07-29 05:26:05
def chunk(input, size):
    return map(None, *([iter(input)] * size))
def chunk(input, size):
    return map(None, *([iter(input)] * size))
绿萝 2024-07-29 05:26:05

如果您知道列表大小:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

如果您不知道(迭代器):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

在后一种情况下,如果您可以确定序列始终包含给定大小的整数块(即没有不完整的最后一块)。

If you know list size:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

If you don't (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

九歌凝 2024-07-29 05:26:05

我在重复中看到了最棒的Python答案这个问题的答案:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

你可以为任何n创建n元组。 如果a = range(1, 15),那么结果将是:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

如果列表被均匀划分,那么你可以将zip_longest替换为zip,否则三元组 (13, 14, None) 将丢失。 上面使用的是Python 3。 对于 Python 2,请使用 izip_longest

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.

初雪 2024-07-29 05:26:05

这是其中的一条:

[AA[i:i+SS] for i in range(len(AA))[::SS]]

细节。 AA 是数组,SS 是块大小。 例如:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

要扩展 py3 中的范围,请执行以下操作

(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]

Here's the one liner:

[AA[i:i+SS] for i in range(len(AA))[::SS]]

Details. AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

To expand the ranges in py3 do

(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
月下伊人醉 2024-07-29 05:26:05

使用Python 3.8中的赋值表达式,它变得非常好:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

这适用于任意可迭代,而不仅仅是一个列表。

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

更新

从Python 3.12开始,这个确切的实现可以作为itertools.batched

With Assignment Expressions in Python 3.8 it becomes quite nice:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

This works on an arbitrary iterable, not just a list.

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

UPDATE

Starting with Python 3.12, this exact implementation is available as itertools.batched

无悔心 2024-07-29 05:26:05

例如,如果您的块大小为 3,您可以执行以下

zip(*[iterable[i::3] for i in range(3)]) 

操作:
http://code.activestate.com/recipes/303060-group -a-list-into-sequential-n-tuples/

当我的块大小是我可以输入的固定数字(例如“3”)并且永远不会改变时,我会使用它。

If you had a chunk size of 3 for example, you could do:

zip(*[iterable[i::3] for i in range(3)]) 

source:
http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.

不乱于心 2024-07-29 05:26:05

toolz 库具有用于此目的的 partition 函数:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]

The toolz library has the partition function for this:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
断爱 2024-07-29 05:26:05

我很好奇不同方法的性能,如下:

在 Python 3.5.1 上测试

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

结果:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844
橘香 2024-07-29 05:26:05

您还可以使用 get_chunks 函数href="http://utilspie.readthedocs.io" rel="noreferrer">utilspie 库如下:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

您可以安装 utilspie 通过 pip:

sudo pip install utilspie

免责声明:我是 utilspie

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.

明明#如月 2024-07-29 05:26:05

我非常喜欢 tzot 和 JFSebastian 提出的 Python 文档版本,
但它有两个缺点:

  • 它不是很明确
  • 我通常不希望在最后一个块中填充值

我在代码中经常使用这个值:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

更新:惰性块版本:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

I like the Python doc's version proposed by tzot and J.F.Sebastian a lot,
but it has two shortcomings:

  • it is not very explicit
  • I usually don't want a fill value in the last chunk

I'm using this one a lot in my code:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

UPDATE: A lazy chunks version:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))
盗琴音 2024-07-29 05:26:05

代码:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

结果:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
梦里南柯 2024-07-29 05:26:05

呵呵,单行版

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]
爱冒险 2024-07-29 05:26:05

另一个更明确的版本。

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList

Another more explicit version.

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList
过去的过去 2024-07-29 05:26:05

在这一点上,我认为我们需要一个递归生成器,以防万一...

在 python 2 中:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

在 python 3 中:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

另外,在大规模外星人入侵的情况下,一个装饰递归生成器< /strong> 可能会变得很方便:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

At this point, I think we need a recursive generator, just in case...

In python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

In python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

Also, in case of massive Alien invasion, a decorated recursive generator might become handy:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e
水溶 2024-07-29 05:26:05

不调用 len() 这对于大型列表很有用:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

这是针对可迭代的:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

上面的功能风格:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))
甜是你 2024-07-29 05:26:05
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

用法:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
温柔少女心 2024-07-29 05:26:05

请参阅此参考

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3

See this reference

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3

慕巷 2024-07-29 05:26:05
def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
姐不稀罕 2024-07-29 05:26:05

既然这里每个人都在谈论迭代器。 boltons 有完美的方法,称为 iterutils.chunked_iter

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

输出:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

但是如果你不想浪费内存,你可以使用旧方法并首先使用 iterutils.chunked

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.

丿*梦醉红颜 2024-07-29 05:26:05

考虑使用 matplotlib.cbook 片段,

例如:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

Consider using matplotlib.cbook pieces

for example:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s
樱花落人离去 2024-07-29 05:26:05
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文