Python:构建 LRU 缓存

发布于 2024-10-07 18:40:26 字数 335 浏览 12 评论 0 原文

我在 MongoDB 中有大约 6,00,000 个条目,格式如下:

feature:category:count

其中

  • feature 可以是任何单词,
  • category 是正数或负数,
  • < strong>count 表示某个功能在该类别的文档中出现的次数。

我想缓存前 1000 个元组,这样就不会每次都查询数据库。

如何在 Python 中构建 LRU 缓存?或者有任何已知的解决方案吗?

I have around 6,00,000 entries in MongoDB in the following format:

feature:category:count

where

  • feature could be any word,
  • category is positive or negative, and
  • count tells how many times a feature occurred in a document for that category.

I want to cache the top 1000 tuples, let's say so as not to query database each time.

How does one build an LRU cache in Python? Or are there any known solutions to this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

<逆流佳人身旁 2024-10-14 18:40:26

Python3.3 中的 LRU 缓存 的复杂度为 O(1)插入、删除和搜索。

该设计使用循环双向链接条目列表(按从旧到新的顺序排列)和哈希表来定位各个链接。缓存命中使用哈希表来查找相关链接并将其移动到链表的头部。缓存未命中会删除最旧的链接并在链表的头部创建一个新链接。

这是一个由 33 行非常基本的 Python 代码组成的简化(但快速)版本(仅使用简单的字典和列表操作)。它在 Python2.0 及更高版本(或 PyPy 或 Jython 或 Python3.x)上运行:

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        # Link structure: [PREV, NEXT, KEY, VALUE]
        self.root = [None, None, None, None]
        self.root[0] = self.root[1] = self.root
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

    def __call__(self, *key):
        mapping = self.mapping
        root = self.root
        link = mapping.get(key)
        if link is not None:
            link_prev, link_next, link_key, value = link
            link_prev[1] = link_next
            link_next[0] = link_prev
            last = root[0]
            last[1] = root[0] = link
            link[0] = last
            link[1] = root
            return value
        value = self.original_function(*key)
        if len(mapping) >= self.maxsize:
            oldest = root[1]
            next_oldest = oldest[1]
            root[1] = next_oldest
            next_oldest[0] = root
            del mapping[oldest[2]]
        last = root[0]
        last[1] = root[0] = mapping[key] = [last, root, key, value]
        return value


if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

从 Python 3.1 开始,OrderedDict 使得实现 LRU 缓存变得更加简单:

from collections import OrderedDict

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = OrderedDict()

    def __call__(self, *key):
        mapping = self.mapping
        try:
            value = mapping[key]
            mapping.move_to_end(key)
        except KeyError:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                mapping.popitem(False)
            mapping[key] = value
        return value

The LRU cache in Python3.3 has O(1) insertion, deletion, and search.

The design uses a circular doubly-linked list of entries (arranged oldest-to-newest) and a hash table to locate individual links. Cache hits use the hash table to find the relevant link and move it to the head of the list. Cache misses delete the oldest link and create a new link at the head of the linked list.

Here's a simplified (but fast) version in 33 lines of very basic Python (using only simple dictionary and list operations). It runs on Python2.0 and later (or PyPy or Jython or Python3.x):

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        # Link structure: [PREV, NEXT, KEY, VALUE]
        self.root = [None, None, None, None]
        self.root[0] = self.root[1] = self.root
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

    def __call__(self, *key):
        mapping = self.mapping
        root = self.root
        link = mapping.get(key)
        if link is not None:
            link_prev, link_next, link_key, value = link
            link_prev[1] = link_next
            link_next[0] = link_prev
            last = root[0]
            last[1] = root[0] = link
            link[0] = last
            link[1] = root
            return value
        value = self.original_function(*key)
        if len(mapping) >= self.maxsize:
            oldest = root[1]
            next_oldest = oldest[1]
            root[1] = next_oldest
            next_oldest[0] = root
            del mapping[oldest[2]]
        last = root[0]
        last[1] = root[0] = mapping[key] = [last, root, key, value]
        return value


if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

Starting in Python 3.1, OrderedDict makes it even simpler to implement a LRU cache:

from collections import OrderedDict

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = OrderedDict()

    def __call__(self, *key):
        mapping = self.mapping
        try:
            value = mapping[key]
            mapping.move_to_end(key)
        except KeyError:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                mapping.popitem(False)
            mapping[key] = value
        return value
冷清清 2024-10-14 18:40:26

除了 Python 3.2 中包含的版本之外,Python Cookbook 中还有 LRU 缓存配方,包括这些由 Python 核心开发人员 Raymond Hettinger 编写。

Besides the version included in Python 3.2, there are LRU cache recipes in the Python Cookbook including these by Python core developer Raymond Hettinger.

三寸金莲 2024-10-14 18:40:26

Python 3.2 functools 包含一个 LRU 缓存< /a>.您可以轻松地从 repo 中挑选它,检查您是否必须调整它以与 Python 2 配合使用(应该不会太难 - 也许使用 itertools 而不是某些内置函数 - 询问您是否需要帮助)并完成。不过,您需要将查询包装成可调用的,并确保它依赖于(可哈希的)函数参数。

Python 3.2 functools includes an LRU cache. You could easily cherrypick it from repo, check if you have to adjust it to work with Python 2 (shouldn't be too hard - perhaps using itertools instead of certain builtins - ask if you need help) and be done. You need to wrap the query it into a callable and make sure it depends on the (hashable) function arguments, though.

佞臣 2024-10-14 18:40:26

还有 lru_cache 的 python 3.3 版本的向后移植,例如 this 在 python 2.7 上运行。如果您对两层缓存感兴趣(如果没有在实例中缓存,它将检查共享缓存)我创建了 lru2cache 基于 lru_cache 的向后移植。

There are also backports of the python 3.3 version of lru_cache such as this which runs on python 2.7. If you are interested in two layers of caching (if not cached in the instance it will check a shared cache) I have created lru2cache based on the backport of lru_cache.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文