python 我应该在这种情况下使用生成器吗?

发布于 2024-11-29 14:28:36 字数 639 浏览 0 评论 0原文

我里面有一个近 2k 字典的列表。我多次使用该列表。例如:

c = myClass()
c.create(source) # where source is a text of approximately 50k chars
                 # this method creates the list that has approximately 2k dictionaries
item = c.get(15012) # now, this one loops thru the list to find an item
                    # whenever the condition is matched, the for loop is broken and the value is returned
item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

现在,想象一下这个场景,我一遍又一遍地使用相同的列表。由于列表很大,我想使用生成器,但据我所知,生成器在抛出 StopIteration 时必须重新创建。那么基本上,在这种情况下,使用发电机是否方便,或者在速度方面有更有效的方法吗?

I have a list of almost 2k dictionaries inside it. And I am using the list several times. For instance:

c = myClass()
c.create(source) # where source is a text of approximately 50k chars
                 # this method creates the list that has approximately 2k dictionaries
item = c.get(15012) # now, this one loops thru the list to find an item
                    # whenever the condition is matched, the for loop is broken and the value is returned
item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

Now, imagine this scenario where I have the use the same list over and over again. Since the list is large I'd like to use a generator but as far as I've understood, generators have to be recreated when they throw StopIteration. So basically, in this scenario, is it convenient to use a generator or is there a more efficient way in terms of speed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

衣神在巴黎 2024-12-06 14:28:36

在我看来,您必须决定要做什么:

1)保存值,这样您就不必重新计算它们,而是使用更多空间来执行此操作。

2) 每次重新计算它们,但节省空间,因为您不必存储它们。

如果您考虑一下,无论您使用哪种生成器/列表/无论什么,这两件事之一都必须发生。而且我认为没有一个简单的硬性规则来说明哪个更好。 (我个人会说选择一个,然后不要回头。你的一生就在前方。)

It sounds to me like you have to decide which you'd rather do:

1) Save the values so you don't have to recalculate them, but use more space to do so.

2) Recalculate them each time, but save on space because you don't have to store them.

If you think about it, no matter what kind of generator/list/whatever you're using, one of those two things has to happen. And I don't think there's a simple hard rule to say which is better. (Personally I'd say pick one and don't look back. You have your whole life ahead of you.)

Hello爱情风 2024-12-06 14:28:36

如果您经常在与先前检索的项目的已知偏移处获取项目,则更改 .get 以不仅返回该项目,而且返回它在列表中的位置。然后,您可以将 prevItem 实现为:

def previtem(self, pos):
    return self.itemlist[pos - 1]

item, pos = c.get(itemnum)
item2 = c.prevItem(pos)

如果您正在对 item 执行某种操作来获取新的 itemnum,则应该存储它们位于 dict 而不是 list 中。这样,get 只是字典查找(比列表搜索快得多):

def get(self, itemnum):
    return self.premade_dict[itemnum]

因此,无论哪种方式,您应该能够用更便宜的操作替换一些搜索。

If you frequently get an item at a known offset from a previously retrieved item, is to change .get to return not only the item, but it's position in the list. Then you could implement prevItem as:

def previtem(self, pos):
    return self.itemlist[pos - 1]

item, pos = c.get(itemnum)
item2 = c.prevItem(pos)

If, instead, you are doing some sort of operation on item to get a new itemnum, you should store them in a dict instead of a list. This way, get is just a dictionary lookup (much faster than list search):

def get(self, itemnum):
    return self.premade_dict[itemnum]

So one way or the other you should be able to replace some searches with cheaper operations.

茶色山野 2024-12-06 14:28:36

取决于您想如何使用生成器。生成器擅长仅在真正需要时执行代码。看来你的 for 循环和break已经做到了这一点。

不过你可以改变你的类接口。

def getItems(cond):
    # find item, remember index
    yield item
    # find previous item, possibly much more efficient with the index
    yield previtem

现在,在调用 getItems() 时,您可以遍历返回的生成器来获取 1 或 2 个项目,并且仅执行所需数量的代码。

Depends how you want to use a generator. Generators are good at only executing code when it is really needed. Seems your for loop with break already does this.

You could change your class interface though.

def getItems(cond):
    # find item, remember index
    yield item
    # find previous item, possibly much more efficient with the index
    yield previtem

Now upon calling getItems(), you can walk the returned generator for 1 or 2 items and only as much code as needed will be executed.

眼中杀气 2024-12-06 14:28:36

两千本词典的列表很正常。我想,一个典型的网站管理员有很多这样的列表。如果您很少需要处理这样的问题,那么您可能可以使用临时解决方案 - 也可能值得考虑字典的字典,这样您就不必每次都迭代每个键。但据我所知,处理这种数据结构的更常规方法是使用数据库。每个字典都可以有一些键(最好是您在循环中检查的条件)。可以指示数据库通过此键对数据进行索引,如果您查看它为检索所需字典所做的工作,您可能会惊讶地发现答案几乎没有 - 它几乎只是简单地切入了可以这么说,您请求的卡(尽管它确实需要做一些工作来设置索引,这类似于排序操作)。

Python 提供了许多将代码映射到各种数据库的好方法。查看功能强大但复杂的 sqlalchemy、内置 std 库 sqlite3 模块,或者和我一起尝试 mongoengine 和 nosql 数据库。 (当然还有很多很多,但是您可以在这里轻松找到另一篇带有一般概述的帖子)。祝你好运。

A list of two thousand dictionaries is quite normal. A typical website admin has many such lists, I'd imagine. If you seldom have to deal with problems like this, you might be fine with an ad hoc solution-- it may be worth considering a dictionary of dictionaries too so you don't have to iterate through every key every time. But the more routine way to address this data structure, from what I gather, is to use a database. Each of your dictionaries can have some key (ideally the condition you're checking for in your loop). The database can be instructed to index the data by this key and if you look at the work it does to retrieve the dictionary you want, you may be surprised to find the answer is almost none-- it pretty much just cuts the deck to the card you requested, so to speak (though it does have to do some work to setup the index, which is something like a sort operation).

Python offers many great ways to map code to databases of all kinds. Check out the powerful, but complex sqlalchemy, the built-in std library sqlite3 module, or join me in experimenting with mongoengine and nosql databases. (There are many many more too of course, but you can easily find another post here with a general overview). Good luck.

木格 2024-12-06 14:28:36

您可以尝试 OrderedDict 的这个子类。我之前提交的内容不正确(在底部提到):

from collections import OrderedDict

class MyOrderedDict(OrderedDict):
    def index(self, key):
        if key not in self.keys():
            raise KeyError
        return list(d.keys()).index(key)
    def prev(self, key):
        idx = self.index(key) - 1
        if idx < 0:
            raise IndexError
        return list(d.keys())[idx]
    def next(self, key):
        _list = list(d.keys())
        idx = self.index(key)
        if idx > len(_list):
            raise IndexError
        return _list[idx+1]

# >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One')))
# >>> d.index(3)
# 0
# >>> d.index(2)
# 1
# >>> d.prev(2)
# 3
# >>> d.prev(3)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 9, in prev
# IndexError
# >>> d.next(4)
# 1
# >>> d.next(1)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 16, in next
# IndexError: list index out of range

编辑 - 正如@agf 下面评论的那样,这是不正确的。

您正在寻找一种从 myClass 检索项目的快速方法,因此您应该使用字典。但同时您希望数据按某种顺序排列,以便您可以对其执行 prevItem 。为什么不将数据存储在 Python 2.7、3.1 中添加的 collections.OrderedDict 中。 参考

You can try this subclass of OrderedDict. My earlier submission was incorrect (mentioned at the bottom):

from collections import OrderedDict

class MyOrderedDict(OrderedDict):
    def index(self, key):
        if key not in self.keys():
            raise KeyError
        return list(d.keys()).index(key)
    def prev(self, key):
        idx = self.index(key) - 1
        if idx < 0:
            raise IndexError
        return list(d.keys())[idx]
    def next(self, key):
        _list = list(d.keys())
        idx = self.index(key)
        if idx > len(_list):
            raise IndexError
        return _list[idx+1]

# >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One')))
# >>> d.index(3)
# 0
# >>> d.index(2)
# 1
# >>> d.prev(2)
# 3
# >>> d.prev(3)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 9, in prev
# IndexError
# >>> d.next(4)
# 1
# >>> d.next(1)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 16, in next
# IndexError: list index out of range

Edit - as @agf commented below, this is incorrect.

You're looking for a fast way to retrieve an item from myClass, so you should use a dictionary. But at the same time you want the data to be in some sort of order, so that you can do a prevItem on it. Why don't you store your data in a collections.OrderedDict added in Python 2.7, 3.1. ref

放肆 2024-12-06 14:28:36

您应该使用列表,因为您可以使用它进行一项简单的优化:按您要查找的属性(在 .get 中)对其进行排序并进行二分搜索。

在包含 2000 个项目的列表中,平均比较次数从 1000 次减少到 10 次!获取上一个(和下一个)项目也变得微不足道。

有关二分算法,请参阅二分模块

You should use a list because you can do one trivial optimization with it: Sort it by the attribute you're looking for (in .get) and do a binary search.

In a list of 2000 items the average number of comparisons goes down from 1000 to 10! Getting the previous (and next) item becomes trivial too.

See the bisect module for the bisection algorithm.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文