如何将可迭代对象拆分为恒定大小的块

发布于 2024-12-18 02:30:45 字数 838 浏览 4 评论 0原文

我很惊讶我找不到一个“批处理”函数,该函数将一个可迭代对象作为输入并返回一个可迭代对象的可迭代对象。

例如:

for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]

或:

for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]

现在,我写了一个我认为非常简单的生成器:

def batch(iterable, n = 1):
   current_batch = []
   for item in iterable:
       current_batch.append(item)
       if len(current_batch) == n:
           yield current_batch
           current_batch = []
   if current_batch:
       yield current_batch

但上面的内容并没有给我我所期望的:

for x in   batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]

所以,我错过了一些东西,这可能表明我完全缺乏对 python 生成器的理解。有人愿意指出我正确的方向吗?

[编辑:我最终意识到,只有当我在 ipython 而不是 python 本身中运行它时,才会发生上述行为]

I am surprised I could not find a "batch" function that would take as input an iterable and return an iterable of iterables.

For example:

for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]

or:

for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]

Now, I wrote what I thought was a pretty simple generator:

def batch(iterable, n = 1):
   current_batch = []
   for item in iterable:
       current_batch.append(item)
       if len(current_batch) == n:
           yield current_batch
           current_batch = []
   if current_batch:
       yield current_batch

But the above does not give me what I would have expected:

for x in   batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]

So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?

[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(20

戒ㄋ 2024-12-25 02:30:45

这可能更有效(更快)

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
    print x

使用列表的示例

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
    print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

它避免了构建新列表。

This is probably more efficient (faster)

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
    print x

Example using list

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
    print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

It avoids building new lists.

深陷 2024-12-25 02:30:45

itertools 模块中的食谱提供了两种方法来执行此操作,具体取决于关于您希望如何处理最终的奇数大小的批次(保留它、用填充值填充它、忽略它或引发异常):

from itertools import islice, zip_longest

def batched(iterable, n):
    "Batch data into lists of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):

from itertools import islice, zip_longest

def batched(iterable, n):
    "Batch data into lists of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')
赠佳期 2024-12-25 02:30:45

More-itertools 包含两个可以满足您需要的函数:

More-itertools includes two functions that do what you need:

云胡 2024-12-25 02:30:45

正如其他人所指出的,您给出的代码正是您想要的。对于使用 itertools.islice 的另一种方法,您可以看到 以下食谱示例

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)

As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)
Hello爱情风 2024-12-25 02:30:45

Python 3.8 的解决方案,如果您正在使用未定义 len 函数的可迭代对象,并且感到筋疲力尽:

from itertools import islice

def batcher(iterable, batch_size):
    iterator = iter(iterable)
    while batch := list(islice(iterator, batch_size)):
        yield batch

示例用法:

def my_gen():
    yield from range(10)
 
for batch in batcher(my_gen(), 3):
    print(batch)

>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]

当然也可以在没有海象运算符的情况下实现。

Solution for Python 3.8 if you are working with iterables that don't define a len function, and get exhausted:

from itertools import islice

def batcher(iterable, batch_size):
    iterator = iter(iterable)
    while batch := list(islice(iterator, batch_size)):
        yield batch

Example usage:

def my_gen():
    yield from range(10)
 
for batch in batcher(my_gen(), 3):
    print(batch)

>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]

Could of course be implemented without the walrus operator as well.

深者入戏 2024-12-25 02:30:45

这是一个非常短的代码片段,我知道它不使用 len 并且可以在 Python 2 和 3 下工作(不是我的创造):

def chunks(iterable, size):
    from itertools import chain, islice
    iterator = iter(iterable)
    for first in iterator:
        yield list(chain([first], islice(iterator, size - 1)))

This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):

def chunks(iterable, size):
    from itertools import chain, islice
    iterator = iter(iterable)
    for first in iterator:
        yield list(chain([first], islice(iterator, size - 1)))
陪你到最终 2024-12-25 02:30:45

奇怪,似乎在 Python 2.x 中对我来说工作得很好

>>> def batch(iterable, n = 1):
...    current_batch = []
...    for item in iterable:
...        current_batch.append(item)
...        if len(current_batch) == n:
...            yield current_batch
...            current_batch = []
...    if current_batch:
...        yield current_batch
...
>>> for x in batch(range(0, 10), 3):
...     print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

Weird, seems to work fine for me in Python 2.x

>>> def batch(iterable, n = 1):
...    current_batch = []
...    for item in iterable:
...        current_batch.append(item)
...        if len(current_batch) == n:
...            yield current_batch
...            current_batch = []
...    if current_batch:
...        yield current_batch
...
>>> for x in batch(range(0, 10), 3):
...     print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
柠檬色的秋千 2024-12-25 02:30:45

python 3.8 中没有新功能的可行版本,改编自 @Atra Azami 的答案。

import itertools    

def batch_generator(iterable, batch_size=1):
    iterable = iter(iterable)

    while True:
        batch = list(itertools.islice(iterable, batch_size))
        if len(batch) > 0:
            yield batch
        else:
            break

for x in batch_generator(range(0, 10), 3):
    print(x)

输出:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

A workable version without new features in python 3.8, adapted from @Atra Azami's answer.

import itertools    

def batch_generator(iterable, batch_size=1):
    iterable = iter(iterable)

    while True:
        batch = list(itertools.islice(iterable, batch_size))
        if len(batch) > 0:
            yield batch
        else:
            break

for x in batch_generator(range(0, 10), 3):
    print(x)

Output:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
枫林﹌晚霞¤ 2024-12-25 02:30:45

我喜欢这个,

def batch(x, bs):
    return [x[i:i+bs] for i in range(0, len(x), bs)]

它返回一个大小为 bs 的批次列表,当然,您可以使用生成器表达式 (i for i in iterable) 将其设为生成器。

I like this one,

def batch(x, bs):
    return [x[i:i+bs] for i in range(0, len(x), bs)]

This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.

乱了心跳 2024-12-25 02:30:45
def batch(iterable, n):
    iterable=iter(iterable)
    while True:
        chunk=[]
        for i in range(n):
            try:
                chunk.append(next(iterable))
            except StopIteration:
                yield chunk
                return
        yield chunk

list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
def batch(iterable, n):
    iterable=iter(iterable)
    while True:
        chunk=[]
        for i in range(n):
            try:
                chunk.append(next(iterable))
            except StopIteration:
                yield chunk
                return
        yield chunk

list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
緦唸λ蓇 2024-12-25 02:30:45

通过利用 islice 和 iter(可调用)行为,尽可能多地转向 CPython:

from itertools import islice

def chunked(generator, size):
    """Read parts of the generator, pause each time after a chunk"""
    # islice returns results until 'size',
    # make_chunk gets repeatedly called by iter(callable).
    gen = iter(generator)
    make_chunk = lambda: list(islice(gen, size))
    return iter(make_chunk, [])

受到 more-itertools 的启发,并简化为该代码的本质。

Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:

from itertools import islice

def chunked(generator, size):
    """Read parts of the generator, pause each time after a chunk"""
    # islice returns results until 'size',
    # make_chunk gets repeatedly called by iter(callable).
    gen = iter(generator)
    make_chunk = lambda: list(islice(gen, size))
    return iter(make_chunk, [])

Inspired by more-itertools, and shortened to the essence of that code.

允世 2024-12-25 02:30:45

这是我在我的项目中使用的。它尽可能高效地处理迭代或列表。

def chunker(iterable, size):
    if not hasattr(iterable, "__len__"):
        # generators don't have len, so fall back to slower
        # method that works with generators
        for chunk in chunker_gen(iterable, size):
            yield chunk
        return

    it = iter(iterable)
    for i in range(0, len(iterable), size):
        yield [k for k in islice(it, size)]


def chunker_gen(generator, size):
    iterator = iter(generator)
    for first in iterator:

        def chunk():
            yield first
            for more in islice(iterator, size - 1):
                yield more

        yield [k for k in chunk()]

This is what I use in my project. It handles iterables or lists as efficiently as possible.

def chunker(iterable, size):
    if not hasattr(iterable, "__len__"):
        # generators don't have len, so fall back to slower
        # method that works with generators
        for chunk in chunker_gen(iterable, size):
            yield chunk
        return

    it = iter(iterable)
    for i in range(0, len(iterable), size):
        yield [k for k in islice(it, size)]


def chunker_gen(generator, size):
    iterator = iter(generator)
    for first in iterator:

        def chunk():
            yield first
            for more in islice(iterator, size - 1):
                yield more

        yield [k for k in chunk()]
十六岁半 2024-12-25 02:30:45

这是使用reduce函数的方法。

Oneliner:

from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])

或更易读的版本:

from functools import reduce
def batch(input_list, batch_size):
  def reducer(cumulator, item):
    if len(cumulator[-1]) < batch_size:
      cumulator[-1].append(item)
      return cumulator
    else:
      cumulator.append([item])
    return cumulator
  return reduce(reducer, input_list, [[]])

测试:

>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]

Here is an approach using reduce function.

Oneliner:

from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])

Or more readable version:

from functools import reduce
def batch(input_list, batch_size):
  def reducer(cumulator, item):
    if len(cumulator[-1]) < batch_size:
      cumulator[-1].append(item)
      return cumulator
    else:
      cumulator.append([item])
    return cumulator
  return reduce(reducer, input_list, [[]])

Test:

>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]
本王不退位尔等都是臣 2024-12-25 02:30:45

这适用于任何可迭代对象。

from itertools import zip_longest, filterfalse

def batch_iterable(iterable, batch_size=2): 
    args = [iter(iterable)] * batch_size 
    return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))

它会像这样工作:

>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]

PS:如果 iterable 有 None 值,它将不起作用。

This would work for any iterable.

from itertools import zip_longest, filterfalse

def batch_iterable(iterable, batch_size=2): 
    args = [iter(iterable)] * batch_size 
    return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))

It would work like this:

>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]

PS: It would not work if iterable has None values.

反目相谮 2024-12-25 02:30:45

您可以仅按批次索引对可迭代项目进行分组。

def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
    # enumerate items and group them by batch index
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    # extract items from enumeration tuples
    item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

当您想要收集内部可迭代对象时,通常会出现这种情况,因此这里是更高级的版本。

def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    if batches_mapper:
        item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    else:
        item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

示例:

print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]

You can just group iterable items by their batch index.

def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
    # enumerate items and group them by batch index
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    # extract items from enumeration tuples
    item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

It is often the case when you want to collect inner iterables so here is more advanced version.

def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    if batches_mapper:
        item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    else:
        item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

Examples:

print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]
街角卖回忆 2024-12-25 02:30:45

您可能需要的相关功能:

def batch(size, i):
    """ Get the i'th batch of the given size """
    return slice(size* i, size* i + size)

用法:

>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]

它从序列中获取第 i 个批次,它也可以与其他数据结构一起使用,例如 pandas 数据帧 (df.iloc[batch(100,0)]) 或 numpy 数组 (array[batch(100,0)])。

Related functionality you may need:

def batch(size, i):
    """ Get the i'th batch of the given size """
    return slice(size* i, size* i + size)

Usage:

>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]

It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).

残花月 2024-12-25 02:30:45
from itertools import *

class SENTINEL: pass

def batch(iterable, n):
    return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))

print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
from itertools import *

class SENTINEL: pass

def batch(iterable, n):
    return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))

print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
你是我的挚爱i 2024-12-25 02:30:45

我用

def batchify(arr, batch_size):
  num_batches = math.ceil(len(arr) / batch_size)
  return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
  

I use

def batchify(arr, batch_size):
  num_batches = math.ceil(len(arr) / batch_size)
  return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
  
夏有森光若流苏 2024-12-25 02:30:45

继续获取(最多)n 个元素,直到用完。

def chop(n, iterable):
    iterator = iter(iterable)
    while chunk := list(take(n, iterator)):
        yield chunk


def take(n, iterable):
    iterator = iter(iterable)
    for i in range(n):
        try:
            yield next(iterator)
        except StopIteration:
            return

Keep taking (at most) n elements until it runs out.

def chop(n, iterable):
    iterator = iter(iterable)
    while chunk := list(take(n, iterator)):
        yield chunk


def take(n, iterable):
    iterator = iter(iterable)
    for i in range(n):
        try:
            yield next(iterator)
        except StopIteration:
            return
最美的太阳 2024-12-25 02:30:45

这段代码有以下特点:

  • 可以将列表或生成器(无 len())作为输入
  • 不需要导入其他包
  • 最后一批没有添加填充
def batch_generator(items, batch_size):
    itemid=0 # Keeps track of current position in items generator/list
    batch = [] # Empty batch
    for item in items: 
      batch.append(item) # Append items to batch
      if len(batch)==batch_size:
        yield batch
        itemid += batch_size # Increment the position in items
        batch = []
    yield batch # yield last bit

This code has the following features:

  • Can take lists or generators (no len()) as input
  • Does not require imports of other packages
  • No padding added to last batch
def batch_generator(items, batch_size):
    itemid=0 # Keeps track of current position in items generator/list
    batch = [] # Empty batch
    for item in items: 
      batch.append(item) # Append items to batch
      if len(batch)==batch_size:
        yield batch
        itemid += batch_size # Increment the position in items
        batch = []
    yield batch # yield last bit
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文