如何计算其他代码消耗的生成器中的项目
我正在创建一个被另一个函数消耗的生成器,但我仍然想知道生成了多少个项目:
lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))
我能想到的最好的方法是用一个保留计数的类来包装生成器,或者可能会转动是否有一种优雅且有效的方法来查看生成器在您不是 Python 2 中使用它的人时生成了多少项?
编辑:这是我最终得到的结果:
class Count(Iterable):
"""Wrap an iterable (typically a generator) and provide a ``count``
field counting the number of items.
Accessing the ``count`` field before iteration is finished will
invalidate the count.
"""
def __init__(self, iterable):
self._iterable = iterable
self._counter = itertools.count()
def __iter__(self):
return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))
@property
def count(self):
self._counter = itertools.repeat(self._counter.next())
return self._counter.next()
I'm creating a generator that gets consumed by another function, but I'd still like to know how many items were generated:
lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))
The best I can come up with is to wrap the generator with a class that keeps a count, or maybe turn it inside out and send() things in. Is there an elegant and efficient way to see how many items a generator produced when you're not the one consuming it in Python 2?
Edit: Here's what I ended up with:
class Count(Iterable):
"""Wrap an iterable (typically a generator) and provide a ``count``
field counting the number of items.
Accessing the ``count`` field before iteration is finished will
invalidate the count.
"""
def __init__(self, iterable):
self._iterable = iterable
self._counter = itertools.count()
def __iter__(self):
return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))
@property
def count(self):
self._counter = itertools.repeat(self._counter.next())
return self._counter.next()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果你不关心你正在消耗发电机,你可以这样做:
If you don't care that you are consuming the generator, you can just do:
这是使用
itertools.count()
< 的另一种方法/a> 示例:希望这对您有帮助。
Here is another way using
itertools.count()
example:Hope this was helpful.
通常,我只是将生成器转换为列表并获取其长度。如果您有理由假设这会消耗太多内存,那么您最好的选择确实似乎是您自己建议的包装类。不过,这还不错:(
最后一行是为了向前兼容 Python 3.x。)
Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:
(The last line is for forward compatibility to Python 3.x.)
这是另一种方法。使用列表进行计数输出有点难看,但它非常紧凑:
像这样使用:
也可以使
counter()
接受一个字典,在其中可能添加一个“count”键,或者可以设置count
成员的对象。Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:
Used like so:
One could alternatively make
counter()
take a dict in which it might add a "count" key, or an object on which it could set acount
member.如果您不需要返回计数而只想记录它,您可以使用finally 块:
它会产生:
If you don't need to return the count and just want to log it, you can use a finally block:
Which produces:
这是与 @sven-marnach 类似的另一个解决方案:
我用生成器函数包装了迭代器,并避免重新定义
next
。结果是可迭代的(不是迭代器,因为它缺少next
方法),但如果足够,它会更快。在我的测试中,速度快了 10%。This is another solution similar to that of @sven-marnach:
I wrapped the iterator with a generator function and avoided re-defining
next
. The result is iterable (not an iterator because it lacksnext
method) but if it is enugh it is faster. In my tests this is 10% faster.此解决方案使用
more_itertools
包中的side_effect
。它可以用作:
This solution uses
side_effect
from themore_itertools
package.It can be used as: