如何将可迭代对象拆分为恒定大小的块
我很惊讶我找不到一个“批处理”函数,该函数将一个可迭代对象作为输入并返回一个可迭代对象的可迭代对象。
例如:
for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]
或:
for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]
现在,我写了一个我认为非常简单的生成器:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
但上面的内容并没有给我我所期望的:
for x in batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]
所以,我错过了一些东西,这可能表明我完全缺乏对 python 生成器的理解。有人愿意指出我正确的方向吗?
[编辑:我最终意识到,只有当我在 ipython 而不是 python 本身中运行它时,才会发生上述行为]
I am surprised I could not find a "batch" function that would take as input an iterable and return an iterable of iterables.
For example:
for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]
or:
for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]
Now, I wrote what I thought was a pretty simple generator:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
But the above does not give me what I would have expected:
for x in batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]
So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?
[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
这可能更有效(更快)
使用列表的示例
它避免了构建新列表。
This is probably more efficient (faster)
Example using list
It avoids building new lists.
itertools 模块中的食谱提供了两种方法来执行此操作,具体取决于关于您希望如何处理最终的奇数大小的批次(保留它、用填充值填充它、忽略它或引发异常):
The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):
More-itertools 包含两个可以满足您需要的函数:
chunked(iterable, n)
返回一个可迭代对象列表,每个长度n
(最后一个除外,它可能更短);ichunked(iterable, n)
类似,但返回一个由 iterables 组成的可迭代对象。More-itertools includes two functions that do what you need:
chunked(iterable, n)
returns an iterable of lists, each of lengthn
(except the last one, which may be shorter);ichunked(iterable, n)
is similar, but returns an iterable of iterables instead.正如其他人所指出的,您给出的代码正是您想要的。对于使用
itertools.islice
的另一种方法,您可以看到 以下食谱示例:As others have noted, the code you have given does exactly what you want. For another approach using
itertools.islice
you could see an example of following recipe:Python 3.8 的解决方案,如果您正在使用未定义 len 函数的可迭代对象,并且感到筋疲力尽:
示例用法:
当然也可以在没有海象运算符的情况下实现。
Solution for Python 3.8 if you are working with iterables that don't define a
len
function, and get exhausted:Example usage:
Could of course be implemented without the walrus operator as well.
这是一个非常短的代码片段,我知道它不使用
len
并且可以在 Python 2 和 3 下工作(不是我的创造):This is a very short code snippet I know that does not use
len
and works under both Python 2 and 3 (not my creation):奇怪,似乎在 Python 2.x 中对我来说工作得很好
Weird, seems to work fine for me in Python 2.x
python 3.8 中没有新功能的可行版本,改编自 @Atra Azami 的答案。
输出:
A workable version without new features in python 3.8, adapted from @Atra Azami's answer.
Output:
我喜欢这个,
它返回一个大小为
bs
的批次列表,当然,您可以使用生成器表达式(i for i in iterable)
将其设为生成器。I like this one,
This returns a list of batches of size
bs
, you can make it a generator by using a generator expression(i for i in iterable)
of course.通过利用 islice 和 iter(可调用)行为,尽可能多地转向 CPython:
受到 more-itertools 的启发,并简化为该代码的本质。
Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:
Inspired by more-itertools, and shortened to the essence of that code.
这是我在我的项目中使用的。它尽可能高效地处理迭代或列表。
This is what I use in my project. It handles iterables or lists as efficiently as possible.
这是使用
reduce
函数的方法。Oneliner:
或更易读的版本:
测试:
Here is an approach using
reduce
function.Oneliner:
Or more readable version:
Test:
这适用于任何可迭代对象。
它会像这样工作:
PS:如果 iterable 有 None 值,它将不起作用。
This would work for any iterable.
It would work like this:
PS: It would not work if iterable has None values.
您可以仅按批次索引对可迭代项目进行分组。
当您想要收集内部可迭代对象时,通常会出现这种情况,因此这里是更高级的版本。
示例:
You can just group iterable items by their batch index.
It is often the case when you want to collect inner iterables so here is more advanced version.
Examples:
您可能需要的相关功能:
用法:
它从序列中获取第 i 个批次,它也可以与其他数据结构一起使用,例如 pandas 数据帧 (
df.iloc[batch(100,0)]
) 或 numpy 数组 (array[batch(100,0)]
)。Related functionality you may need:
Usage:
It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (
df.iloc[batch(100,0)]
) or numpy array (array[batch(100,0)]
).我用
I use
继续获取(最多)n 个元素,直到用完。
Keep taking (at most) n elements until it runs out.
这段代码有以下特点:
This code has the following features: