如何克隆生成器对象？

发布于 2024-10-16 07:56:29 字数 532 浏览 4 评论 0原文

考虑这种情况：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

walk = os.walk('/home')

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

我们需要多次使用相同的 walk 数据。我有一个基准场景，并且必须使用相同的walk数据才能获得有用的结果。

我尝试使用 walk2 = walk 克隆并在第二次迭代中使用，但没有成功。我怎样才能复制它？有可能吗？

原文

Consider this scenario:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

walk = os.walk('/home')

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

We need to use the same walk data more than once. I've a benchmark scenario and the use of same walk data is mandatory to get helpful results.

I've tried walk2 = walk to clone and use in the second iteration, but it didn't work. How can I copy it? Is it ever possible?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七度光 2024-10-23 07:56:29

您可以使用 itertools.tee()：

walk, walk2 = itertools.tee(walk)

请注意，这可能正如文档指出的那样，“需要大量额外的存储空间”。

You can use itertools.tee():

walk, walk2 = itertools.tee(walk)

Note that this might "need significant extra storage", as the documentation points out.

回复收藏 0 原文

终止放荡 2024-10-23 07:56:29

如果您知道每次使用都会迭代整个生成器，那么通过将生成器展开到列表并多次使用该列表，您可能会获得最佳性能。

walk = list(os.walk('/home'))

回复收藏 0 原文

撩起发的微风 2024-10-23 07:56:29

定义一个函数

 def walk_home():
     for r in os.walk('/home'):
         yield r

或者甚至这

def walk_home():
    return os.walk('/home')

两者都像这样使用：

for root, dirs, files in walk_home():
    for pathname in dirs+files:
        print os.path.join(root, pathname)

Define a function

 def walk_home():
     for r in os.walk('/home'):
         yield r

Or even this

def walk_home():
    return os.walk('/home')

Both are used like this:

for root, dirs, files in walk_home():
    for pathname in dirs+files:
        print os.path.join(root, pathname)

回复收藏 0 原文

刘备忘录 2024-10-23 07:56:29

这是 functools.partial()制作一个快速的生成器工厂：

from functools import partial
import os

walk_factory = partial(os.walk, '/home')

walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()

functools.partial() 的作用很难用人类语言来描述，但这就是它的用途。

它部分填充函数参数，但不执行该函数。因此，它充当函数/生成器工厂。

This is a good usecase for functools.partial()
to make a quick generator-factory:

from functools import partial
import os

walk_factory = partial(os.walk, '/home')

walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()

What functools.partial() does is hard to describe with human-words, but this^ is what it's for.

It partially fills out function-params without executing that function. Consequently it acts as a function/generator factory.

回复收藏 0 原文

無心 2024-10-23 07:56:29

这个答案旨在扩展/阐述其他答案所表达的内容。解决方案必然会根据您想要实现的目标而有所不同。

如果您想多次迭代 os.walk 的完全相同的结果，则需要从 os.walk 可迭代项初始化一个列表（即 walk = list(os.walk(path))).

如果您必须保证数据保持不变，这可能是您唯一的选择。然而，在某些情况下这是不可能或不可取的。

如果输出的大小足够大，则不可能 list() 迭代（即尝试 list() 整个文件系统可能会冻结您的计算机）。
如果您希望在每次使用之前获取“新鲜”数据，那么 list() 是不可取的。

如果 list() 不适合，您将需要按需运行生成器。请注意，发电机在每次使用后都会熄灭，因此这会带来一个小问题。为了多次“重新运行”生成器，您可以使用以下模式：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

class WalkMaker:
    def __init__(self, path):
        self.path = path
    def __iter__(self):
        for root, dirs, files in os.walk(self.path):
            for pathname in dirs + files:
                yield os.path.join(root, pathname)

walk = WalkMaker('/home')

for path in walk:
    pass

# do something...

for path in walk:
    pass

上述设计模式将允许您保持代码干燥。

This answer aims to extend/elaborate on what the other answers have expressed. The solution will necessarily vary depending on what exactly you aim to achieve.

If you want to iterate over the exact same result of os.walk multiple times, you will need to initialize a list from the os.walk iterable's items (i.e. walk = list(os.walk(path))).

If you must guarantee the data remains the same, that is probably your only option. However, there are several scenarios in which this is not possible or desirable.

It will not be possible to list() an iterable if the output is of sufficient size (i.e. attempting to list() an entire filesystem may freeze your computer).
It is not desirable to list() an iterable if you wish to acquire "fresh" data prior to each use.

In the event that list() is not suitable, you will need to run your generator on demand. Note that generators are extinguised after each use, so this poses a slight problem. In order to "rerun" your generator multiple times, you can use the following pattern:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

class WalkMaker:
    def __init__(self, path):
        self.path = path
    def __iter__(self):
        for root, dirs, files in os.walk(self.path):
            for pathname in dirs + files:
                yield os.path.join(root, pathname)

walk = WalkMaker('/home')

for path in walk:
    pass

# do something...

for path in walk:
    pass

The aforementioned design pattern will allow you to keep your code DRY.

回复收藏 0 原文

雨落星ぅ辰 2024-10-23 07:56:29

这个“Python 生成器侦听器”代码允许您在单个生成器上拥有多个侦听器，例如 os.walk，甚至可以让某人稍后“插话”。

def walkme():
os.walk('/home')

m1 = Muxer(walkme)
m2 = Muxer(walkme)

那么 m1 和 m2 甚至可以在线程中运行并在空闲时进行处理。

请参阅：https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3

import queue
from threading import Lock
from collections import namedtuple

class Muxer():
    Entry = namedtuple('Entry', 'genref listeners, lock')

    already = {}
    top_lock = Lock()

    def __init__(self, func, restart=False):
        self.restart = restart
        self.func = func
        self.queue = queue.Queue()

        with self.top_lock:
            if func not in self.already:
                self.already[func] = self.Entry([func()], [], Lock())
            ent = self.already[func]

        self.genref = ent.genref
        self.lock = ent.lock
        self.listeners = ent.listeners

        self.listeners.append(self)

    def __iter__(self):
        return self

    def __next__(self):
        try:
            e = self.queue.get_nowait()
        except queue.Empty:
            with self.lock:
                try:
                    e = self.queue.get_nowait()
                except queue.Empty:
                    try:
                        e = next(self.genref[0])
                        for other in self.listeners:
                            if not other is self:
                                other.queue.put(e)
                    except StopIteration:
                        if self.restart:
                            self.genref[0] = self.func()
                        raise
        return e

    def __del__(self):
        with self.top_lock:
            try:
                self.listeners.remove(self)
            except ValueError:
                pass
            if not self.listeners and self.func in self.already:
                del self.already[self.func]

This "Python Generator Listeners" code allows you to have many listeners on a single generator, like os.walk, and even have someone "chime in" later.

def walkme():
os.walk('/home')

m1 = Muxer(walkme)
m2 = Muxer(walkme)

then m1 and m2 can run in threads even and process at their leisure.

See: https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3

import queue
from threading import Lock
from collections import namedtuple

class Muxer():
    Entry = namedtuple('Entry', 'genref listeners, lock')

    already = {}
    top_lock = Lock()

    def __init__(self, func, restart=False):
        self.restart = restart
        self.func = func
        self.queue = queue.Queue()

        with self.top_lock:
            if func not in self.already:
                self.already[func] = self.Entry([func()], [], Lock())
            ent = self.already[func]

        self.genref = ent.genref
        self.lock = ent.lock
        self.listeners = ent.listeners

        self.listeners.append(self)

    def __iter__(self):
        return self

    def __next__(self):
        try:
            e = self.queue.get_nowait()
        except queue.Empty:
            with self.lock:
                try:
                    e = self.queue.get_nowait()
                except queue.Empty:
                    try:
                        e = next(self.genref[0])
                        for other in self.listeners:
                            if not other is self:
                                other.queue.put(e)
                    except StopIteration:
                        if self.restart:
                            self.genref[0] = self.func()
                        raise
        return e

    def __del__(self):
        with self.top_lock:
            try:
                self.listeners.remove(self)
            except ValueError:
                pass
            if not self.listeners and self.func in self.already:
                del self.already[self.func]

回复收藏 0 原文

~没有更多了~