def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]
Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:
The current version, as suggested by J.F.Sebastian:
#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
I guess Guido's time machine works—worked—will work—will have worked—was working again.
These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.
This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:
Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:
I believe this is the shortest chunker proposed that offers optional padding.
As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
from itertools import cycle
from typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
baskets = [[] for _ in range(min(maxbaskets, len(items)))]
for item, basket in zip(items, cycle(baskets)):
basket.append(item)
return baskets
from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
n_baskets = min(maxbaskets, len(items))
return [items[i::n_baskets] for i in range(n_baskets)]
from itertools import islice
from typing import List, Any, Generator
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
n_baskets = min(maxbaskets, len(items))
for i in range(n_baskets):
yield islice(items, i, None, n_baskets)
查看结果:
from pprint import pprint
items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])
更新了先前的解决方案
这是另一个平衡的解决方案,改编自我过去在生产中使用的函数,它使用模运算符:
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in range(maxbaskets)]
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
并且我创建了一个生成器,如果将其放入列表中,它会执行相同的操作:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in range(baskets):
yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
并且最后,因为我看到上述所有函数都以连续顺序返回元素(如给定的那样):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in range(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in range(length)]
"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:
A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.
Critique of other answers here
When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.
Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.
Why can't we divide these better?
Cycle Solution
A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:
Now we need our lists into which to populate the elements:
baskets = [[] for _ in range(number_of_baskets)]
Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:
for element, basket in zip(items, cycle(baskets)):
basket.append(element)
To productionize this solution, we write a function, and provide the type annotations:
from itertools import cycle
from typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
baskets = [[] for _ in range(min(maxbaskets, len(items)))]
for item, basket in zip(items, cycle(baskets)):
basket.append(item)
return baskets
In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.
Slices
Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:
This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.
In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:
from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
n_baskets = min(maxbaskets, len(items))
return [items[i::n_baskets] for i in range(n_baskets)]
And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.
I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.
from itertools import islice
from typing import List, Any, Generator
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
n_baskets = min(maxbaskets, len(items))
for i in range(n_baskets):
yield islice(items, i, None, n_baskets)
View results with:
from pprint import pprint
items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])
Updated prior solutions
Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in range(maxbaskets)]
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
And I created a generator that does the same if you put it into a list:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in range(baskets):
yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in range(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in range(length)]
Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
如果您不知道(迭代器):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
在后一种情况下,如果您可以确定序列始终包含给定大小的整数块(即没有不完整的最后一块)。
If you know list size:
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
If you don't (an iterator):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).
If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
I was curious about the performance of different approaches and here it is:
Tested on Python 3.5.1
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
Another more explicit version.
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
在 python 3 中:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
另外,在大规模外星人入侵的情况下,一个装饰递归生成器< /strong> 可能会变得很方便:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
@dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
At this point, I think we need a recursive generator, just in case...
In python 2:
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
In python 3:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
Also, in case of massive Alien invasion, a decorated recursive generator might become handy:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
@dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
发布评论
评论(30)
这是一个生成大小均匀的块的生成器:
对于 Python 2,使用
xrange
而不是range
:下面是一个列表理解单行代码。 不过,上面的方法更可取,因为使用命名函数使代码更容易理解。 对于 Python 3:
对于 Python 2:
Here's a generator that yields evenly-sized chunks:
For Python 2, using
xrange
instead ofrange
:Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:
For Python 2:
超级简单:
对于 Python 2,使用
xrange()
而不是range()
。Something super simple:
For Python 2, use
xrange()
instead ofrange()
.我知道这有点旧,但还没有人提到
numpy .array_split
:结果:
I know this is kind of old but nobody yet mentioned
numpy.array_split
:Result:
直接来自(旧)Python 文档(itertools 的食谱):
当前版本,如 JFSebastian 所建议的:
我猜 Guido 的时间机器可以工作——工作了——将工作——将工作——再次工作。
这些解决方案之所以有效,是因为
[iter(iterable)]*n
(或早期版本中的等效项)创建一个迭代器,并在列表。 然后,izip_longest
有效地执行“每个”迭代器的循环; 因为这是同一个迭代器,所以每次这样的调用都会使其前进,从而导致每个这样的 zip-roundrobin 生成一个由n
项组成的元组。Python ≥3.12
itertools.batched 可用。
Directly from the (old) Python documentation (recipes for itertools):
The current version, as suggested by J.F.Sebastian:
I guess Guido's time machine works—worked—will work—will have worked—was working again.
These solutions work because
[iter(iterable)]*n
(or the equivalent in the earlier version) creates one iterator, repeatedn
times in the list.izip_longest
then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple ofn
items.Python ≥3.12
itertools.batched is available.
我很惊讶没有人想到使用
iter
的 双参数形式:演示:
这适用于任何可迭代对象并延迟生成输出。 它返回元组而不是迭代器,但我认为它仍然具有一定的优雅性。 它也不垫; 如果您想要填充,上面的简单变体就足够了:
演示:
与基于
izip_longest
的解决方案一样,上面总是填充。 据我所知,对于可选填充的函数,没有一行或两行的 itertools 配方。 通过结合上述两种方法,这个方法非常接近:演示:
我相信这是提供可选填充的最短分块器。
正如 Tomasz Gandor 观察到的,如果两个填充分块器遇到一长串填充值,它们将意外停止。 这是以合理的方式解决该问题的最终变体:
演示:
I'm surprised nobody has thought of using
iter
's two-argument form:Demo:
This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:
Demo:
Like the
izip_longest
-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:Demo:
I believe this is the shortest chunker proposed that offers optional padding.
As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:
Demo:
不要重新发明轮子。
更新:在 Python 3.12+ 中找到了完整的解决方案
itertools.batched
。给定
代码
itertools .batched
++详细信息
在 Python 3.12 之前建议使用以下非本机方法:
more_itertools
+(或DIY,如果你愿意的话)
标准库
参考
more_itertools.chunked
(相关帖子)more_itertools.sliced
more_itertools.grouper
(相关帖子)more_itertools.windowed
(另请参阅交错
,zip_offset
)more_itertools.chunked_even
zip_longest
(相关帖子,相关帖子)setdefault< /code>
(有序结果需要 Python 3.6+)
collections.defaultdict
(有序结果需要 Python 3.6+)+ 实现 itertools 食谱 等等。 <代码>> pip install more_itertools
++包含在 Python 标准库 3.12+ 中。
batched
类似于more_itertools.chunked
。Don't reinvent the wheel.
UPDATE: A complete solution is found in Python 3.12+
itertools.batched
.Given
Code
itertools.batched
++Details
The following non-native approaches were suggested prior to Python 3.12:
more_itertools
+(or DIY, if you want)
The Standard Library
References
more_itertools.chunked
(related posted)more_itertools.sliced
more_itertools.grouper
(related post)more_itertools.windowed
(see alsostagger
,zip_offset
)more_itertools.chunked_even
zip_longest
(related post, related post)setdefault
(ordered results requires Python 3.6+)collections.defaultdict
(ordered results requires Python 3.6+)+ A third-party library that implements itertools recipes and more.
> pip install more_itertools
++Included in Python Standard Library 3.12+.
batched
is similar tomore_itertools.chunked
.这是一个适用于任意迭代的生成器:
示例:
Here is a generator that work on arbitrary iterables:
Example:
简单而优雅
,或者如果您愿意:
Simple yet elegant
or if you prefer:
如何将列表分成大小均匀的块?
对我来说,“大小均匀的块”意味着它们的长度相同,或者除非该选项,否则长度的差异最小。 例如,21 件物品的 5 个篮子可能会产生以下结果:
更喜欢后一种结果的实际原因:如果您使用这些功能来分配工作,您已经内置了其中一个可能比其他项目完成得更好的前景,因此它当其他人继续努力工作时,他会无所事事地坐着。
对这里其他答案的批评
当我最初写这个答案时,其他答案都不是大小均匀的块 - 它们都在最后留下了一个矮块,所以它们没有很好地平衡,并且长度的方差高于必要的长度。
例如,当前的最佳答案以:
其他,例如
list(grouper(3, range(7)))
和chunk(range(7), 3)
两者都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]
。None
只是填充,在我看来相当不优雅。 他们没有均匀地对可迭代对象进行分块。为什么我们不能更好地划分这些?
循环解决方案
使用
itertools.cycle
的高级平衡解决方案,这就是我今天可能采用的方式。 设置如下:现在我们需要在其中填充元素的列表:
最后,我们将要分配的元素与篮子的循环一起压缩,直到用完元素,从语义上讲,这正是我们所要的想要:
结果如下:
为了生产该解决方案,我们编写一个函数,并提供类型注释:
在上面,我们获取项目列表和篮子的最大数量。 我们创建一个空列表列表,以循环方式在其中附加每个元素。
切片
另一个优雅的解决方案是使用切片 - 特别是不太常用的切片step参数。 即:
这是特别优雅的,因为切片不关心数据有多长 - 结果,我们的第一个篮子,只有它需要的长度。 我们只需要增加每个篮子的起点。
事实上,这可能是单行代码,但为了可读性并避免代码行过长,我们将采用多行代码:
并且来自 itertools 模块的
islice
将提供一种惰性迭代方法,就像最初是在问题中提出的。我不认为大多数用例会受益匪浅,因为原始数据已经完全具体化在列表中,但对于大型数据集,它可以节省近一半的内存使用量。
查看结果:
更新了先前的解决方案
这是另一个平衡的解决方案,改编自我过去在生产中使用的函数,它使用模运算符:
并且我创建了一个生成器,如果将其放入列表中,它会执行相同的操作:
并且最后,因为我看到上述所有函数都以连续顺序返回元素(如给定的那样):
输出
测试它们:
打印出:
请注意,连续生成器提供与其他两个相同长度模式的块,但这些项目都是按顺序排列的,并且它们的划分就像划分离散元素列表一样均匀。
How do you split a list into evenly sized chunks?
"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:
A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.
Critique of other answers here
When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.
For example, the current top answer ends with:
Others, like
list(grouper(3, range(7)))
, andchunk(range(7), 3)
both return:[(0, 1, 2), (3, 4, 5), (6, None, None)]
. TheNone
's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.Why can't we divide these better?
Cycle Solution
A high-level balanced solution using
itertools.cycle
, which is the way I might do it today. Here's the setup:Now we need our lists into which to populate the elements:
Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:
Here's the result:
To productionize this solution, we write a function, and provide the type annotations:
In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.
Slices
Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:
This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.
In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:
And
islice
from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.
View results with:
Updated prior solutions
Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:
And I created a generator that does the same if you put it into a list:
And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):
Output
To test them out:
Which prints out:
Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.
如果您知道列表大小:
如果您不知道(迭代器):
在后一种情况下,如果您可以确定序列始终包含给定大小的整数块(即没有不完整的最后一块)。
If you know list size:
If you don't (an iterator):
In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).
我在重复中看到了最棒的Python答案这个问题的答案:
你可以为任何n创建n元组。 如果
a = range(1, 15)
,那么结果将是:如果列表被均匀划分,那么你可以将
zip_longest
替换为zip
,否则三元组(13, 14, None)
将丢失。 上面使用的是Python 3。 对于 Python 2,请使用izip_longest
。I saw the most awesome Python-ish answer in a duplicate of this question:
You can create n-tuple for any n. If
a = range(1, 15)
, then the result will be:If the list is divided evenly, then you can replace
zip_longest
withzip
, otherwise the triplet(13, 14, None)
would be lost. Python 3 is used above. For Python 2, useizip_longest
.这是其中的一条:
细节。 AA 是数组,SS 是块大小。 例如:
要扩展 py3 中的范围,请执行以下操作
Here's the one liner:
Details. AA is array, SS is chunk size. For example:
To expand the ranges in py3 do
使用Python 3.8中的赋值表达式,它变得非常好:
这适用于任意可迭代,而不仅仅是一个列表。
更新
从Python 3.12开始,这个确切的实现可以作为itertools.batched
With Assignment Expressions in Python 3.8 it becomes quite nice:
This works on an arbitrary iterable, not just a list.
UPDATE
Starting with Python 3.12, this exact implementation is available as itertools.batched
例如,如果您的块大小为 3,您可以执行以下
操作:
http://code.activestate.com/recipes/303060-group -a-list-into-sequential-n-tuples/
当我的块大小是我可以输入的固定数字(例如“3”)并且永远不会改变时,我会使用它。
If you had a chunk size of 3 for example, you could do:
source:
http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.
toolz 库具有用于此目的的
partition
函数:The toolz library has the
partition
function for this:我很好奇不同方法的性能,如下:
在 Python 3.5.1 上测试
结果:
I was curious about the performance of different approaches and here it is:
Tested on Python 3.5.1
Results:
您还可以使用
get_chunks
函数href="http://utilspie.readthedocs.io" rel="noreferrer">utilspie
库如下:您可以安装
utilspie
通过 pip:免责声明:我是 utilspie 库。
You may also use
get_chunks
function ofutilspie
library as:You can install
utilspie
via pip:Disclaimer: I am the creator of utilspie library.
我非常喜欢 tzot 和 JFSebastian 提出的 Python 文档版本,
但它有两个缺点:
我在代码中经常使用这个值:
更新:惰性块版本:
I like the Python doc's version proposed by tzot and J.F.Sebastian a lot,
but it has two shortcomings:
I'm using this one a lot in my code:
UPDATE: A lazy chunks version:
代码:
结果:
code:
result:
呵呵,单行版
heh, one line version
另一个更明确的版本。
Another more explicit version.
在这一点上,我认为我们需要一个递归生成器,以防万一...
在 python 2 中:
在 python 3 中:
另外,在大规模外星人入侵的情况下,一个装饰递归生成器< /strong> 可能会变得很方便:
At this point, I think we need a recursive generator, just in case...
In python 2:
In python 3:
Also, in case of massive Alien invasion, a decorated recursive generator might become handy:
不调用 len() 这对于大型列表很有用:
这是针对可迭代的:
上面的功能风格:
OR:
OR:
Without calling len() which is good for large lists:
And this is for iterables:
The functional flavour of the above:
OR:
OR:
用法:
usage:
请参阅此参考
Python3
See this reference
Python3
既然这里每个人都在谈论迭代器。
boltons
有完美的方法,称为iterutils.chunked_iter
。输出:
但是如果你不想浪费内存,你可以使用旧方法并首先使用
iterutils.chunked
。Since everybody here talking about iterators.
boltons
has perfect method for that, callediterutils.chunked_iter
.Output:
But if you don't want to be mercy on memory, you can use old-way and store the full
list
in the first place withiterutils.chunked
.考虑使用 matplotlib.cbook 片段,
例如:
Consider using matplotlib.cbook pieces
for example: