如何限制字典的大小?

发布于 2024-08-24 19:25:59 字数 150 浏览 6 评论 0原文

我想在 python 中使用字典,但将键/值对的数量限制为 X。换句话说,如果字典当前存储 X 个键/值对并且我执行插入,我想要以下之一要删除的现有对。如果它是最近最少插入/访问的密钥,那就太好了,但这并不是完全必要的。

如果标准库中存在这个,请节省我一些时间并指出它!

I'd like to work with a dict in python, but limit the number of key/value pairs to X. In other words, if the dict is currently storing X key/value pairs and I perform an insertion, I would like one of the existing pairs to be dropped. It would be nice if it was the least recently inserted/accesses key but that's not completely necessary.

If this exists in the standard library please save me some time and point it out!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

欲拥i 2024-08-31 19:25:59

Python 2.7 和 3.1 有 OrderedDict 并且有纯 Python 实现对于早期的Python。

from collections import OrderedDict

class LimitedSizeDict(OrderedDict):
    def __init__(self, *args, **kwds):
        self.size_limit = kwds.pop("size_limit", None)
        OrderedDict.__init__(self, *args, **kwds)
        self._check_size_limit()

    def __setitem__(self, key, value):
        OrderedDict.__setitem__(self, key, value)
        self._check_size_limit()

    def _check_size_limit(self):
        if self.size_limit is not None:
            while len(self) > self.size_limit:
                self.popitem(last=False)

您还必须重写其他可以插入项目的方法,例如 updateOrderedDict 的主要用途是让您可以轻松控制弹出的内容,否则普通的 dict 就可以工作。

Python 2.7 and 3.1 have OrderedDict and there are pure-Python implementations for earlier Pythons.

from collections import OrderedDict

class LimitedSizeDict(OrderedDict):
    def __init__(self, *args, **kwds):
        self.size_limit = kwds.pop("size_limit", None)
        OrderedDict.__init__(self, *args, **kwds)
        self._check_size_limit()

    def __setitem__(self, key, value):
        OrderedDict.__setitem__(self, key, value)
        self._check_size_limit()

    def _check_size_limit(self):
        if self.size_limit is not None:
            while len(self) > self.size_limit:
                self.popitem(last=False)

You would also have to override other methods that can insert items, such as update. The primary use of OrderedDict is so you can control what gets popped easily, otherwise a normal dict would work.

夜夜流光相皎洁 2024-08-31 19:25:59

cachetools 将为您提供映射哈希的良好实现来执行此操作(并且它适用于 python 2 和 3) )。

文档摘录:

就本模块而言,缓存是固定的可变映射
最大尺寸。当缓存已满时,即添加另一个项目
缓存将超过其最大大小,缓存必须选择哪个项目
根据合适的缓存算法进行丢弃。

cachetools will provide you nice implementation of Mapping Hashes that does this (and it works on python 2 and 3).

Excerpt of the documentation:

For the purpose of this module, a cache is a mutable mapping of a fixed
maximum size. When the cache is full, i.e. by adding another item the
cache would exceed its maximum size, the cache must choose which item(s)
to discard based on a suitable cache algorithm.

我的影子我的梦 2024-08-31 19:25:59

这是一个简单的、无 LRU Python 2.6+ 解决方案(在较旧的 Python 中,您可以使用 UserDict.DictMixin 执行类似的操作,但在 2.6 及更高版本中不建议这样做,并且来自 collections< 的 ABC /code> 无论如何都是更好的...):

import collections

class MyDict(collections.MutableMapping):
    def __init__(self, maxlen, *a, **k):
        self.maxlen = maxlen
        self.d = dict(*a, **k)
        while len(self) > maxlen:
            self.popitem()
    def __iter__(self):
        return iter(self.d)
    def __len__(self):
        return len(self.d)
    def __getitem__(self, k):
        return self.d[k]
    def __delitem__(self, k):
        del self.d[k]
    def __setitem__(self, k, v):
        if k not in self and len(self) == self.maxlen:
            self.popitem()
        self.d[k] = v

d = MyDict(5)
for i in range(10):
    d[i] = i
    print(sorted(d))

正如其他答案提到的,您可能不想子类 dict —— 不幸的是,显式委托给 self.d 是样板文件,但它确实保证每一个其他方法都由collections.MutableMapping正确提供。

Here's a simple, no-LRU Python 2.6+ solution (in older Pythons you could do something similar with UserDict.DictMixin, but in 2.6 and better that's not recommended, and the ABCs from collections are preferable anyway...):

import collections

class MyDict(collections.MutableMapping):
    def __init__(self, maxlen, *a, **k):
        self.maxlen = maxlen
        self.d = dict(*a, **k)
        while len(self) > maxlen:
            self.popitem()
    def __iter__(self):
        return iter(self.d)
    def __len__(self):
        return len(self.d)
    def __getitem__(self, k):
        return self.d[k]
    def __delitem__(self, k):
        del self.d[k]
    def __setitem__(self, k, v):
        if k not in self and len(self) == self.maxlen:
            self.popitem()
        self.d[k] = v

d = MyDict(5)
for i in range(10):
    d[i] = i
    print(sorted(d))

As other answers mentioned, you probably don't want to subclass dict -- the explicit delegation to self.d is unfortunately boilerplatey but it does guarantee that every other method is properly supplied by collections.MutableMapping.

过潦 2024-08-31 19:25:59

这是一个简单而高效的 LRU 缓存,用非常简单的 Python 代码编写,可以在任何 python 版本 1.5.2 或更高版本上运行:

class LRU_Cache:

    def __init__(self, original_function, maxsize=1000):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

        PREV, NEXT, KEY, VALUE = 0, 1, 2, 3         # link fields
        self.head = [None, None, None, None]        # oldest
        self.tail = [self.head, None, None, None]   # newest
        self.head[NEXT] = self.tail

    def __call__(self, *key):
        PREV, NEXT = 0, 1
        mapping, head, tail = self.mapping, self.head, self.tail

        link = mapping.get(key, head)
        if link is head:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                old_prev, old_next, old_key, old_value = head[NEXT]
                head[NEXT] = old_next
                old_next[PREV] = head
                del mapping[old_key]
            last = tail[PREV]
            link = [last, tail, key, value]
            mapping[key] = last[NEXT] = tail[PREV] = link
        else:
            link_prev, link_next, key, value = link
            link_prev[NEXT] = link_next
            link_next[PREV] = link_prev
            last = tail[PREV]
            last[NEXT] = tail[PREV] = link
            link[PREV] = last
            link[NEXT] = tail
        return value

if __name__ == '__main__':
    p = LRU_Cache(pow, maxsize=3)
    for i in [1,2,3,4,5,3,1,5,1,1]:
        print(i, p(i, 2))

Here is a simple and efficient LRU cache written with dirt simple Python code that runs on any python version 1.5.2 or later:

class LRU_Cache:

    def __init__(self, original_function, maxsize=1000):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

        PREV, NEXT, KEY, VALUE = 0, 1, 2, 3         # link fields
        self.head = [None, None, None, None]        # oldest
        self.tail = [self.head, None, None, None]   # newest
        self.head[NEXT] = self.tail

    def __call__(self, *key):
        PREV, NEXT = 0, 1
        mapping, head, tail = self.mapping, self.head, self.tail

        link = mapping.get(key, head)
        if link is head:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                old_prev, old_next, old_key, old_value = head[NEXT]
                head[NEXT] = old_next
                old_next[PREV] = head
                del mapping[old_key]
            last = tail[PREV]
            link = [last, tail, key, value]
            mapping[key] = last[NEXT] = tail[PREV] = link
        else:
            link_prev, link_next, key, value = link
            link_prev[NEXT] = link_next
            link_next[PREV] = link_prev
            last = tail[PREV]
            last[NEXT] = tail[PREV] = link
            link[PREV] = last
            link[NEXT] = tail
        return value

if __name__ == '__main__':
    p = LRU_Cache(pow, maxsize=3)
    for i in [1,2,3,4,5,3,1,5,1,1]:
        print(i, p(i, 2))
碍人泪离人颜 2024-08-31 19:25:59

有很多好的答案,但我想指出一个简单的、Pythonic 的 LRU 缓存实现。这与亚历克斯·马尔泰利的回答类似。

from collections import OrderedDict, MutableMapping

class Cache(MutableMapping):
    def __init__(self, maxlen, items=None):
        self._maxlen = maxlen
        self.d = OrderedDict()
        if items:
            for k, v in items:
                self[k] = v

    @property
    def maxlen(self):
        return self._maxlen

    def __getitem__(self, key):
        self.d.move_to_end(key)
        return self.d[key]

    def __setitem__(self, key, value):
        if key in self.d:
            self.d.move_to_end(key)
        elif len(self.d) == self.maxlen:
            self.d.popitem(last=False)
        self.d[key] = value

    def __delitem__(self, key):
        del self.d[key]

    def __iter__(self):
        return self.d.__iter__()

    def __len__(self):
        return len(self.d)

There have been many good answers, but I want to point out a simple, pythonic implementation for LRU cache. It's similar to Alex Martelli's answer.

from collections import OrderedDict, MutableMapping

class Cache(MutableMapping):
    def __init__(self, maxlen, items=None):
        self._maxlen = maxlen
        self.d = OrderedDict()
        if items:
            for k, v in items:
                self[k] = v

    @property
    def maxlen(self):
        return self._maxlen

    def __getitem__(self, key):
        self.d.move_to_end(key)
        return self.d[key]

    def __setitem__(self, key, value):
        if key in self.d:
            self.d.move_to_end(key)
        elif len(self.d) == self.maxlen:
            self.d.popitem(last=False)
        self.d[key] = value

    def __delitem__(self, key):
        del self.d[key]

    def __iter__(self):
        return self.d.__iter__()

    def __len__(self):
        return len(self.d)
末蓝 2024-08-31 19:25:59

您可以通过子类化 dict 来创建自定义字典类。在您的情况下,您必须覆盖 __setitem__ 来检查您自己的长度,并在重新缓存限制时删除某些内容。以下示例将在每次插入后打印当前长度:

class mydict(dict):
    def __setitem__(self, k, v):
        dict.__setitem__(self, k, v)
        print len(self)

d = mydict()
d['foo'] = 'bar'
d['bar'] = 'baz'

You can create a custom dictionary class by subclassing dict. In your case, you would have to override __setitem__ to have check your own length and delete something if the limit is recahed. The following example would print the current lenght after every insertion:

class mydict(dict):
    def __setitem__(self, k, v):
        dict.__setitem__(self, k, v)
        print len(self)

d = mydict()
d['foo'] = 'bar'
d['bar'] = 'baz'
[旋木] 2024-08-31 19:25:59

字典没有这种行为。您可以创建自己的类来执行此操作,例如类似“

class MaxSizeDict(object):
    def __init__(self, max_size):
        self.max_size = max_size
        self.dict = {}
    def __setitem__(self, key, value):
        if key in self.dict:
            self.dict[key] = value    
            return

        if len(self.dict) >= self.max_size:
      ...

有关此的一些注释”之类的内容,

  • 这对于某些人来说很容易在这里子类化 dict 。从技术上讲,您可以做到这一点,但它很容易出现错误,因为这些方法不相互依赖。您可以使用 UserDict.DictMixin 来避免定义所有方法。如果您对 dict 进行子类化,那么您可以重用的方法很少。
  • 字典不知道最近最少添加的键是什么,因为字典是无序的。
    • 2.7 将引入 collections.OrderedDict,但目前单独保持键的顺序应该可以正常工作(使用 collections.deque 作为队列)。
    • 如果获取最旧的项并不那么重要,您可以使用 popitem 方法删除任意一项。
  • 我将“oldest”解释为“第一次插入”,大约是这样。您必须做一些不同的事情来消除 LRU 项。最明显的有效策略是保留一个双向链接的键列表,并引用存储为字典值(以及实际值)的节点本身。这变得更加复杂,并且用纯 Python 实现它会带来大量开销。

A dict does not have this behavior. You could make your own class that does this, for example something like

class MaxSizeDict(object):
    def __init__(self, max_size):
        self.max_size = max_size
        self.dict = {}
    def __setitem__(self, key, value):
        if key in self.dict:
            self.dict[key] = value    
            return

        if len(self.dict) >= self.max_size:
      ...

A few notes about this

  • It would be tempting for some to subclass dict here. You can technically do this, but it is bug-prone because the methods do not depend on each other. You can use UserDict.DictMixin to save having to define all methods. There are few methods you would be able re-use if you subclass dict.
  • A dict does not know what the least recently added key is, since dicts are unordered.
    • 2.7 will introduce collections.OrderedDict, but for now keeping the keys in order separately should work fine (use a collections.deque as a queue).
    • If getting the oldest isn't all that imporant, you can just use the popitem method to delete one arbitrary item.
  • I interprettered oldest to mean first insertion, approximately. You would have to do something a bit different to eliminate the LRU items. The most obvious efficient strategy would involve keeping a doubly-linked list of keys with references to the nodes themselves stored as dict values (along with the real values). This gets more complicated and implementing it in pure Python carries a lot of overhead.
北笙凉宸 2024-08-31 19:25:59

有一个名为 CircularDict 的库实现了此行为。它允许限制 dict 可以存储的最大项目数量,还可以设置内存使用限制。

它可以安装:

pip install circular-dict

并以这种方式使用:

from circular_dict import CircularDict

# Initialize a CircularDict with a maximum length of 3
my_dict = CircularDict(maxlen=3) # You could also set maxsize_bytes=8*1024 bytes

# Fill it with 4 items
my_dict['item1'] = 'value1'
my_dict['item2'] = 'value2'
my_dict['item3'] = 'value3'
# When adding this 4th item, the 1st one will be dropped
my_dict['item4'] = 'value4'
print(circ_dict)

输出将如下所示。

{'item2': 'value2', 'item3': 'value3', 'item4': 'value4'}

There is a library called CircularDict that implements this behaviour. It allows to limit the maximum amount of items the dict can store, but also to set memory usage limits.

It can be installed with:

pip install circular-dict

And used this way:

from circular_dict import CircularDict

# Initialize a CircularDict with a maximum length of 3
my_dict = CircularDict(maxlen=3) # You could also set maxsize_bytes=8*1024 bytes

# Fill it with 4 items
my_dict['item1'] = 'value1'
my_dict['item2'] = 'value2'
my_dict['item3'] = 'value3'
# When adding this 4th item, the 1st one will be dropped
my_dict['item4'] = 'value4'
print(circ_dict)

Ouptut will look like.

{'item2': 'value2', 'item3': 'value3', 'item4': 'value4'}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文