检测构建独立迭代器的最便宜方法

发布于 2025-01-14 00:43:39 字数 313 浏览 5 评论 0原文

假设我正在编写一个接受可迭代对象的函数,并且我的函数希望不知道该可迭代对象是否实际上是迭代器。

(这是一种常见的情况,对吧?我认为基本上所有的 itertools 函数都是这样编写的。接受一个可迭代对象,返回一个迭代器。)

例如,如果我调用 itertools.tee(•, 2)< /code> 在一个对象上,而它恰好还不是一个迭代器,这可能意味着只需对其调用两次 iter 以获得两个独立的迭代器会更便宜。 itertools 函数是否足够聪明,能够知道这一点,如果没有,避免这种方式不必要的成本的最佳方法是什么?

Suppose I'm writing a function taking in an iterable, and my function wants to be agnostic as to whether that iterable is actually an iterator yet or not.

(This is a common situation, right? I think basically all the itertools functions are written this way. Take in an iterable, return an iterator.)

If I call, for instance, itertools.tee(•, 2) on an object, and it happens to not be an iterator yet, that presumably means it would be cheaper just to call iter on it twice to get my two independent iterators. Are itertools functions smart enough to know this, and if not, what's the best way to avoid unnecessary costs in this way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小苏打饼 2025-01-21 00:43:39

观察:

>>> def foo(x):
...     return x.__iter__() # or return iter(x)
...
>>> l = [0, 1]
>>> it = l.__iter__()
>>> it
<list_iterator object at 0x00000190F59C3640>
>>> print(foo(l), foo(it))
<list_iterator object at 0x00000190F5980AF0> <list_iterator object at 0x00000190F59C3640>

因此您无需担心函数的参数是可迭代的还是已经是迭代器了。您可以对已经是迭代器的对象调用方法 __iter__ ,在这种情况下它只返回 self 。这不是一个昂贵的调用,并且比您可以用来测试它是否是迭代器的任何事情都要便宜,例如它是否有 __next__ 方法(以及如果没有的话,无论如何都必须调用 __iter__ )。

更新

现在,我们发现,自从调用 <在前者上调用两次 iter 会给你两个不同的迭代器,而在后者上调用两次 iter 则不会。例如,itertools.tee 需要一个可迭代的对象。如果你向它传递一个实现 __iter__ 并返回 'self的迭代器,它显然会起作用,因为tee` 不需要两个独立的迭代器来发挥其魔力。

但是,如果您正在编写一个传递一个可迭代的迭代器,该迭代器是通过在传递的迭代器上内部使用两个或多个迭代器来实现的,那么您真正想要的是什么测试的是正在传递的内容是否支持多个、并发、独立的迭代,无论它是迭代器还是普通的迭代器:

def my_iterator(iterable):
     it1 = iter(iterable)
     it2 = iter(iterable)
     if it1 is it2:
          raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
     ...

class Foo:
     def __init__(self, lst):
          self.lst = lst

     def __iter__(self):
          self.idx = 0
          return self

     def __next__(self):
          if self.idx < len(self.lst):
               value = self.lst[self.idx]
               self.idx += 1
               return value
          raise StopIteration()

f = Foo("abcd")
for x in f:
     print(x)

my_iterator(f)

打印:

a
b
c
d
Traceback (most recent call last):
  File "C:\Booboo\test\test.py", line 26, in <module>
    my_iterator(f)
  File "C:\Booboo\test\test.py", line 5, in my_iterator
    raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
ValueError: The passed iterable does not support multiple, concurrent, independent iterations.

The writer of 原件,已通过迭代器必须以支持多个、并发、独立迭代的方式编写。

Observe:

>>> def foo(x):
...     return x.__iter__() # or return iter(x)
...
>>> l = [0, 1]
>>> it = l.__iter__()
>>> it
<list_iterator object at 0x00000190F59C3640>
>>> print(foo(l), foo(it))
<list_iterator object at 0x00000190F5980AF0> <list_iterator object at 0x00000190F59C3640>

So you do not need to worry whether the argument to your function is an iterable or already an iterator. You can call method __iter__ on something that is already an iterator and it just returns self in that case. This is not an expensive call and would be cheaper than anything you could possibly do to test to see if it is an iterator, such as whether it has a __next__ method (and then having to call __iter__ on it anyway if it doesn't).

Update

We now see that there is a bit difference in passing to your function an iterable vs passing an iterator (depending on how the iterator is written, of course) since calling iter twice on the former will give you two distinct iterators while calling iter twice on the latter will not. itertools.tee, as an example, is expecting an iterable. If you pass it an iterator that implements __iter__ that returns 'selfit will clearly work sincetee` does not need two independent iterators for it to do its magic.

But if you are writing an iterator that is passed an iterable that is implemented by internally using two or more iterators on the passed iterator, what you really want to be testing for is whether what is being passed is something that support multiple, concurrent, independent iterations regardless of whether it is an iterator or just a plain iterator:

def my_iterator(iterable):
     it1 = iter(iterable)
     it2 = iter(iterable)
     if it1 is it2:
          raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
     ...

class Foo:
     def __init__(self, lst):
          self.lst = lst

     def __iter__(self):
          self.idx = 0
          return self

     def __next__(self):
          if self.idx < len(self.lst):
               value = self.lst[self.idx]
               self.idx += 1
               return value
          raise StopIteration()

f = Foo("abcd")
for x in f:
     print(x)

my_iterator(f)

Prints:

a
b
c
d
Traceback (most recent call last):
  File "C:\Booboo\test\test.py", line 26, in <module>
    my_iterator(f)
  File "C:\Booboo\test\test.py", line 5, in my_iterator
    raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
ValueError: The passed iterable does not support multiple, concurrent, independent iterations.

The writer of the original, passed iterator must write it in such a way that it supports multiple, concurrent, independent iterations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文