python 我应该在这种情况下使用生成器吗？

发布于 2024-11-29 14:28:36 字数 639 浏览 0 评论 0原文

我里面有一个近 2k 字典的列表。我多次使用该列表。例如：

c = myClass()
c.create(source) # where source is a text of approximately 50k chars
                 # this method creates the list that has approximately 2k dictionaries
item = c.get(15012) # now, this one loops thru the list to find an item
                    # whenever the condition is matched, the for loop is broken and the value is returned
item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

现在，想象一下这个场景，我一遍又一遍地使用相同的列表。由于列表很大，我想使用生成器，但据我所知，生成器在抛出 StopIteration 时必须重新创建。那么基本上，在这种情况下，使用发电机是否方便，或者在速度方面有更有效的方法吗？

原文

I have a list of almost 2k dictionaries inside it. And I am using the list several times. For instance:

c = myClass()
c.create(source) # where source is a text of approximately 50k chars
                 # this method creates the list that has approximately 2k dictionaries
item = c.get(15012) # now, this one loops thru the list to find an item
                    # whenever the condition is matched, the for loop is broken and the value is returned
item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

Now, imagine this scenario where I have the use the same list over and over again. Since the list is large I'd like to use a generator but as far as I've understood, generators have to be recreated when they throw StopIteration. So basically, in this scenario, is it convenient to use a generator or is there a more efficient way in terms of speed?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

衣神在巴黎 2024-12-06 14:28:36

在我看来，您必须决定要做什么：

1）保存值，这样您就不必重新计算它们，而是使用更多空间来执行此操作。

2) 每次重新计算它们，但节省空间，因为您不必存储它们。

如果您考虑一下，无论您使用哪种生成器/列表/无论什么，这两件事之一都必须发生。而且我认为没有一个简单的硬性规则来说明哪个更好。（我个人会说选择一个，然后不要回头。你的一生就在前方。）

回复收藏 0 原文

Hello爱情风 2024-12-06 14:28:36

如果您经常在与先前检索的项目的已知偏移处获取项目，则更改 .get 以不仅返回该项目，而且返回它在列表中的位置。然后，您可以将 prevItem 实现为：

def previtem(self, pos):
    return self.itemlist[pos - 1]

item, pos = c.get(itemnum)
item2 = c.prevItem(pos)

如果您正在对 item 执行某种操作来获取新的 itemnum，则应该存储它们位于 dict 而不是 list 中。这样，get 只是字典查找（比列表搜索快得多）：

def get(self, itemnum):
    return self.premade_dict[itemnum]

因此，无论哪种方式，您应该能够用更便宜的操作替换一些搜索。

If you frequently get an item at a known offset from a previously retrieved item, is to change .get to return not only the item, but it's position in the list. Then you could implement prevItem as:

def previtem(self, pos):
    return self.itemlist[pos - 1]

item, pos = c.get(itemnum)
item2 = c.prevItem(pos)

If, instead, you are doing some sort of operation on item to get a new itemnum, you should store them in a dict instead of a list. This way, get is just a dictionary lookup (much faster than list search):

def get(self, itemnum):
    return self.premade_dict[itemnum]

So one way or the other you should be able to replace some searches with cheaper operations.

回复收藏 0 原文

茶色山野 2024-12-06 14:28:36

取决于您想如何使用生成器。生成器擅长仅在真正需要时执行代码。看来你的 for 循环和break已经做到了这一点。

不过你可以改变你的类接口。

def getItems(cond):
    # find item, remember index
    yield item
    # find previous item, possibly much more efficient with the index
    yield previtem

现在，在调用 getItems() 时，您可以遍历返回的生成器来获取 1 或 2 个项目，并且仅执行所需数量的代码。

Depends how you want to use a generator. Generators are good at only executing code when it is really needed. Seems your for loop with break already does this.

You could change your class interface though.

def getItems(cond):
    # find item, remember index
    yield item
    # find previous item, possibly much more efficient with the index
    yield previtem

Now upon calling getItems(), you can walk the returned generator for 1 or 2 items and only as much code as needed will be executed.

回复收藏 0 原文

眼中杀气 2024-12-06 14:28:36

两千本词典的列表很正常。我想，一个典型的网站管理员有很多这样的列表。如果您很少需要处理这样的问题，那么您可能可以使用临时解决方案 - 也可能值得考虑字典的字典，这样您就不必每次都迭代每个键。但据我所知，处理这种数据结构的更常规方法是使用数据库。每个字典都可以有一些键（最好是您在循环中检查的条件）。可以指示数据库通过此键对数据进行索引，如果您查看它为检索所需字典所做的工作，您可能会惊讶地发现答案几乎没有 - 它几乎只是简单地切入了可以这么说，您请求的卡（尽管它确实需要做一些工作来设置索引，这类似于排序操作）。

Python 提供了许多将代码映射到各种数据库的好方法。查看功能强大但复杂的 sqlalchemy、内置 std 库 sqlite3 模块，或者和我一起尝试 mongoengine 和 nosql 数据库。（当然还有很多很多，但是您可以在这里轻松找到另一篇带有一般概述的帖子）。祝你好运。

回复收藏 0 原文

木格 2024-12-06 14:28:36

您可以尝试 OrderedDict 的这个子类。我之前提交的内容不正确（在底部提到）：

from collections import OrderedDict

class MyOrderedDict(OrderedDict):
    def index(self, key):
        if key not in self.keys():
            raise KeyError
        return list(d.keys()).index(key)
    def prev(self, key):
        idx = self.index(key) - 1
        if idx < 0:
            raise IndexError
        return list(d.keys())[idx]
    def next(self, key):
        _list = list(d.keys())
        idx = self.index(key)
        if idx > len(_list):
            raise IndexError
        return _list[idx+1]

# >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One')))
# >>> d.index(3)
# 0
# >>> d.index(2)
# 1
# >>> d.prev(2)
# 3
# >>> d.prev(3)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 9, in prev
# IndexError
# >>> d.next(4)
# 1
# >>> d.next(1)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 16, in next
# IndexError: list index out of range

编辑 - 正如@agf 下面评论的那样，这是不正确的。

您正在寻找一种从 myClass 检索项目的快速方法，因此您应该使用字典。但同时您希望数据按某种顺序排列，以便您可以对其执行 prevItem 。为什么不将数据存储在 Python 2.7、3.1 中添加的 collections.OrderedDict 中。参考

You can try this subclass of OrderedDict. My earlier submission was incorrect (mentioned at the bottom):

from collections import OrderedDict

class MyOrderedDict(OrderedDict):
    def index(self, key):
        if key not in self.keys():
            raise KeyError
        return list(d.keys()).index(key)
    def prev(self, key):
        idx = self.index(key) - 1
        if idx < 0:
            raise IndexError
        return list(d.keys())[idx]
    def next(self, key):
        _list = list(d.keys())
        idx = self.index(key)
        if idx > len(_list):
            raise IndexError
        return _list[idx+1]

# >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One')))
# >>> d.index(3)
# 0
# >>> d.index(2)
# 1
# >>> d.prev(2)
# 3
# >>> d.prev(3)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 9, in prev
# IndexError
# >>> d.next(4)
# 1
# >>> d.next(1)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 16, in next
# IndexError: list index out of range

Edit - as @agf commented below, this is incorrect.

You're looking for a fast way to retrieve an item from myClass, so you should use a dictionary. But at the same time you want the data to be in some sort of order, so that you can do a prevItem on it. Why don't you store your data in a collections.OrderedDict added in Python 2.7, 3.1. ref

回复收藏 0 原文