Python：列表理解背后的机制

发布于 2024-10-15 03:48:08 字数 344 浏览 5 评论 0原文

在 for 循环上下文中使用列表理解或 in 关键字时，即：

for o in X:
    do_something_with(o)

或

l=[o for o in X]

in 背后的机制如何工作？
它调用 X 中的哪些函数\方法？
如果 X 可以遵循多种方法，那么优先顺序是什么？
如何编写一个高效的 X，以便列表理解很快？

原文

When using list comprehension or the in keyword in a for loop context, i.e:

for o in X:
    do_something_with(o)

l=[o for o in X]

How does the mechanism behind in works?
Which functions\methods within X does it call?
If X can comply to more than one method, what's the precedence?
How to write an efficient X, so that list comprehension will be quick?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡紫姑娘！ 2024-10-22 03:48:08

据我所知，完整且正确的答案。

for，无论是在 for 循环还是列表推导式中，都会在 X 上调用 iter()。如果 X 有一个 __iter__ 方法或 __getitem__ 方法，iter() 将返回一个可迭代对象。如果它同时实现，则使用__iter__。如果两者都没有，你会得到TypeError: 'Nothing' object is not iterable。

这实现了 __getitem__：

class GetItem(object):
    def __init__(self, data):
        self.data = data

    def __getitem__(self, x):
        return self.data[x]

用法：

>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

这是实现 __iter__ 的示例：

class TheIterator(object):
    def __init__(self, data):
        self.data = data
        self.index = -1

    # Note: In  Python 3 this is called __next__
    def next(self):
        self.index += 1
        try:
            return self.data[self.index]
        except IndexError:
            raise StopIteration

    def __iter__(self):
        return self

class Iter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        return TheIterator(data)

用法：

>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

如您所见，您需要实现迭代器和 __iter__ > 返回迭代器。

您可以将它们组合起来：

class CombinedIter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        self.index = -1
        return self

    def next(self):
        self.index += 1
        try:
            return self.data[self.index]
        except IndexError:
            raise StopIteration

用法：

>>> well, you get it, it's all the same...

但是这样您一次只能有一个迭代器。
好的，在这种情况下，您可以这样做：

class CheatIter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        return iter(self.data)

但这是作弊，因为您只是重用了 list 的 __iter__ 方法。
一个更简单的方法是使用yield，并将__iter__变成生成器：

class Generator(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        for x in self.data:
            yield x

最后是我推荐的方法。简单高效。

The, afaik, complete and correct answer.

for, both in for loops and list comprehensions, calls iter() on X. iter() will return an iterable if X either has an __iter__ method or a __getitem__ method. If it implements both, __iter__ is used. If it has neither you get TypeError: 'Nothing' object is not iterable.

This implements a __getitem__:

class GetItem(object):
    def __init__(self, data):
        self.data = data

    def __getitem__(self, x):
        return self.data[x]

Usage:

>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

This is an example of implementing __iter__:

class TheIterator(object):
    def __init__(self, data):
        self.data = data
        self.index = -1

    # Note: In  Python 3 this is called __next__
    def next(self):
        self.index += 1
        try:
            return self.data[self.index]
        except IndexError:
            raise StopIteration

    def __iter__(self):
        return self

class Iter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        return TheIterator(data)

Usage:

>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

As you see you need both to implement an iterator, and __iter__ that returns the iterator.

You can combine them:

class CombinedIter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        self.index = -1
        return self

    def next(self):
        self.index += 1
        try:
            return self.data[self.index]
        except IndexError:
            raise StopIteration

Usage:

>>> well, you get it, it's all the same...

But then you can only have one iterator going at once.
OK, in this case you could just do this:

class CheatIter(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        return iter(self.data)

But that's cheating because you are just reusing the __iter__ method of list.
An easier way is to use yield, and make __iter__ into a generator:

class Generator(object):
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        for x in self.data:
            yield x

This last is the way I would recommend. Easy and efficient.

回复收藏 0 原文

亚希 2024-10-22 03:48:08

X 必须是可迭代的。它必须实现返回迭代器对象的__iter__()；迭代器对象必须实现 next()，每次调用它时都会返回下一个项目，如果没有下一个项目，则引发 StopIteration。

列表、元组和生成器都是可迭代的。

请注意，普通的 for 运算符使用相同的机制。

回复收藏 0 原文

Smile简单爱 2024-10-22 03:48:08

回答问题的评论我可以说在这种情况下阅读源代码并不是最好的主意。负责执行已编译代码的代码 (ceval. c) 对于第一次看到 Python 源代码的人来说似乎并不是很冗长。下面是表示 for 循环中迭代的代码片段：

   TARGET(FOR_ITER)
        /* before: [iter]; after: [iter, iter()] *or* [] */
        v = TOP();

        /*
          Here tp_iternext corresponds to next() in Python
        */
        x = (*v->ob_type->tp_iternext)(v); 
        if (x != NULL) {
            PUSH(x);
            PREDICT(STORE_FAST);
            PREDICT(UNPACK_SEQUENCE);
            DISPATCH();
        }
        if (PyErr_Occurred()) {
            if (!PyErr_ExceptionMatches(
                            PyExc_StopIteration))
                break;
            PyErr_Clear();
        }
        /* iterator ended normally */
        x = v = POP();
        Py_DECREF(v);
        JUMPBY(oparg);
        DISPATCH();

要查找此处实际发生的情况，您需要深入研究一堆其他文件，这些文件的冗长程度也好不了多少。因此，我认为在这种情况下，文档和像 SO 这样的网站是第一个去的地方，而应该只检查源代码是否有未发现的实现细节。

Answering question's comments I can say that reading source is not the best idea in this case. The code that is responsible for execution of compiled code (ceval.c) does not seem to be very verbose for a person that sees Python sources for the first time. Here is the snippet that represents iteration in for loops:

   TARGET(FOR_ITER)
        /* before: [iter]; after: [iter, iter()] *or* [] */
        v = TOP();

        /*
          Here tp_iternext corresponds to next() in Python
        */
        x = (*v->ob_type->tp_iternext)(v); 
        if (x != NULL) {
            PUSH(x);
            PREDICT(STORE_FAST);
            PREDICT(UNPACK_SEQUENCE);
            DISPATCH();
        }
        if (PyErr_Occurred()) {
            if (!PyErr_ExceptionMatches(
                            PyExc_StopIteration))
                break;
            PyErr_Clear();
        }
        /* iterator ended normally */
        x = v = POP();
        Py_DECREF(v);
        JUMPBY(oparg);
        DISPATCH();

To find what actually happens here you need to dive into bunch of other files which verbosity is not much better. Thus I think that in such cases documentation and sites like SO are the first place to go while the source should be checked only for uncovered implementation details.

回复收藏 0 原文

风柔一江水 2024-10-22 03:48:08

X 必须是一个可迭代对象，这意味着它需要有一个 __iter__() 方法。

因此，要启动 for..in 循环或列表理解，首先调用 X 的 __iter__() 方法来获取迭代器对象；然后每次迭代都会调用该对象的 next() 方法，直到引发 StopIteration 为止，此时迭代停止。

我不确定你的第三个问题是什么意思，以及如何为你的第四个问题提供有意义的答案，除了你的迭代器不应该立即在内存中构造整个列表。

回复收藏 0 原文

忘年祭陌 2024-10-22 03:48:08

也许这有帮助（教程 http://docs.python.org/tutorial/classes.html 第 9.9 节）：

在幕后，for 语句
在容器对象上调用 iter()。
该函数返回一个迭代器
定义方法 next() 的对象
它访问中的元素
一次一个容器。当那里
不再有元素，next() 引发
StopIteration 异常告诉
for 循环终止。

回复收藏 0 原文

回眸一遍 2024-10-22 03:48:08

回答你的问题：

背后的机制是如何运作的？

正如其他人已经指出的那样，它与普通 for 循环使用的机制完全相同。

它调用 X 中的哪些函数\方法？

正如下面的评论所述，它调用 iter(X) 来获取迭代器。如果X定义了方法函数__iter__()，则会调用该方法返回一个迭代器；否则，如果X定义了__getitem__()，则会重复调用该函数来迭代X。请参阅此处的 iter() 的 Python 文档：http:// /docs.python.org/library/functions.html#iter

如果 X 可以遵循多个方法，那么优先级是什么？

我不确定你的问题到底是什么，但是 Python 对于如何解析方法名称有标准规则，并且在这里遵循它们。以下是对此的讨论：

方法解析顺序 (MRO) in新风格的Python类

如何编写一个高效的X，以便列表理解会很快？

我建议您阅读更多有关 Python 中的迭代器和生成器的内容。让任何类支持迭代的一种简单方法是为 iter() 创建一个生成器函数。以下是生成器的讨论：

http://linuxgazette.net/100/pramode.html