如何计算其他代码消耗的生成器中的项目

发布于 2024-11-15 06:38:49 字数 941 浏览 2 评论 0原文

我正在创建一个被另一个函数消耗的生成器，但我仍然想知道生成了多少个项目：

lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))

我能想到的最好的方法是用一个保留计数的类来包装生成器，或者可能会转动是否有一种优雅且有效的方法来查看生成器在您不是 Python 2 中使用它的人时生成了多少项？

编辑：这是我最终得到的结果：

class Count(Iterable):
    """Wrap an iterable (typically a generator) and provide a ``count``
    field counting the number of items.

    Accessing the ``count`` field before iteration is finished will
    invalidate the count.
    """
    def __init__(self, iterable):
        self._iterable = iterable
        self._counter = itertools.count()

    def __iter__(self):
        return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))

    @property
    def count(self):
        self._counter = itertools.repeat(self._counter.next())
        return self._counter.next()

原文

I'm creating a generator that gets consumed by another function, but I'd still like to know how many items were generated:

lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))

The best I can come up with is to wrap the generator with a class that keeps a count, or maybe turn it inside out and send() things in. Is there an elegant and efficient way to see how many items a generator produced when you're not the one consuming it in Python 2?

Edit: Here's what I ended up with:

class Count(Iterable):
    """Wrap an iterable (typically a generator) and provide a ``count``
    field counting the number of items.

    Accessing the ``count`` field before iteration is finished will
    invalidate the count.
    """
    def __init__(self, iterable):
        self._iterable = iterable
        self._counter = itertools.count()

    def __iter__(self):
        return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))

    @property
    def count(self):
        self._counter = itertools.repeat(self._counter.next())
        return self._counter.next()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鱼窥荷 2024-11-22 06:38:49

如果你不关心你正在消耗发电机，你可以这样做：

sum(1 for x in gen)

If you don't care that you are consuming the generator, you can just do:

sum(1 for x in gen)

回复收藏 0 原文

相思碎 2024-11-22 06:38:49

这是使用 itertools.count()< 的另一种方法/a> 示例：

import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:
            break

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

希望这对您有帮助。

Here is another way using itertools.count() example:

import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:
            break

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.

回复收藏 0 原文

别想她 2024-11-22 06:38:49

通常，我只是将生成器转换为列表并获取其长度。如果您有理由假设这会消耗太多内存，那么您最好的选择确实似乎是您自己建议的包装类。不过，这还不错：（

class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

最后一行是为了向前兼容 Python 3.x。）

Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:

class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(The last line is for forward compatibility to Python 3.x.)

回复收藏 0 原文

人生百味 2024-11-22 06:38:49

这是另一种方法。使用列表进行计数输出有点难看，但它非常紧凑：

def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

像这样使用：

count = [0]
process(counter(lines, count))
print count[0]

也可以使 counter() 接受一个字典，在其中可能添加一个“count”键，或者可以设置 count 成员的对象。

Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:

def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

Used like so:

count = [0]
process(counter(lines, count))
print count[0]

One could alternatively make counter() take a dict in which it might add a "count" key, or an object on which it could set a count member.

回复收藏 0 原文

一抹微笑 2024-11-22 06:38:49

如果您不需要返回计数而只想记录它，您可以使用finally 块：

def generator():
    i = 0
    try:
        for x in range(10):
            i += 1
            yield x
    finally:
        print '{} iterations'.format(i)

[ n for n in generator() ]

它会产生：

10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If you don't need to return the count and just want to log it, you can use a finally block:

def generator():
    i = 0
    try:
        for x in range(10):
            i += 1
            yield x
    finally:
        print '{} iterations'.format(i)

[ n for n in generator() ]

Which produces:

10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

回复收藏 0 原文

冷清清 2024-11-22 06:38:49

这是与 @sven-marnach 类似的另一个解决方案：

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

我用生成器函数包装了迭代器，并避免重新定义 next。结果是可迭代的（不是迭代器，因为它缺少 next 方法），但如果足够，它会更快。在我的测试中，速度快了 10%。

This is another solution similar to that of @sven-marnach:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

I wrapped the iterator with a generator function and avoided re-defining next. The result is iterable (not an iterator because it lacks next method) but if it is enugh it is faster. In my tests this is 10% faster.

回复收藏 0 原文

浅听莫相离 2024-11-22 06:38:49

此解决方案使用 more_itertools 包中的 side_effect。

from typing import TypeVar, Tuple, Iterator, Callable, Iterable
from itertools import count
from more_itertools import side_effect, peekable

T = TypeVar("T")
def counter_wrap(iterable: Iterable[T]) -> \
        Tuple[Iterator[T], Callable[[], int]]:
    """
    Returns a new iterator based on ``iterable``
    and a getter that when called returns the number of times
    the returned iterator was called up until that time
    """
    counter = peekable(count())
    def get_count() -> int:
        return counter.peek()
    return (
        side_effect(lambda e: next(counter), iterable),
        get_count
    )

它可以用作：

>>> iterator, counter = counter_wrap((1, 2, 3, 4, 5, 6, "plast", "last"))
>>> counter()
0
>>> counter()  # Calling this has no side effect (counter not incremented)
0
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> counter()  # Updates when the iterator returns an element
3
>>> next(iterator)
4
>>> next(iterator)
5
>>> next(iterator)
6
>>> next(iterator)
'plast'
>>> counter()
7
>>> next(iterator)
'last'
>>> counter()
8
>>> next(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> counter()
8

This solution uses side_effect from the more_itertools package.

from typing import TypeVar, Tuple, Iterator, Callable, Iterable
from itertools import count
from more_itertools import side_effect, peekable

T = TypeVar("T")
def counter_wrap(iterable: Iterable[T]) -> \
        Tuple[Iterator[T], Callable[[], int]]:
    """
    Returns a new iterator based on ``iterable``
    and a getter that when called returns the number of times
    the returned iterator was called up until that time
    """
    counter = peekable(count())
    def get_count() -> int:
        return counter.peek()
    return (
        side_effect(lambda e: next(counter), iterable),
        get_count
    )

It can be used as:

>>> iterator, counter = counter_wrap((1, 2, 3, 4, 5, 6, "plast", "last"))
>>> counter()
0
>>> counter()  # Calling this has no side effect (counter not incremented)
0
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> counter()  # Updates when the iterator returns an element
3
>>> next(iterator)
4
>>> next(iterator)
5
>>> next(iterator)
6
>>> next(iterator)
'plast'
>>> counter()
7
>>> next(iterator)
'last'
>>> counter()
8
>>> next(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> counter()
8

回复收藏 0 原文

~没有更多了~