Python：构建 LRU 缓存

发布于 2024-10-07 18:40:26 字数 335 浏览 12 评论 0 原文

我在 MongoDB 中有大约 6,00,000 个条目，格式如下：

feature:category:count

其中

feature 可以是任何单词，
category 是正数或负数，
< strong>count 表示某个功能在该类别的文档中出现的次数。

我想缓存前 1000 个元组，这样就不会每次都查询数据库。

如何在 Python 中构建 LRU 缓存？或者有任何已知的解决方案吗？

原文

I have around 6,00,000 entries in MongoDB in the following format:

feature:category:count

where

feature could be any word,
category is positive or negative, and
count tells how many times a feature occurred in a document for that category.

I want to cache the top 1000 tuples, let's say so as not to query database each time.

How does one build an LRU cache in Python? Or are there any known solutions to this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

<逆流佳人身旁 2024-10-14 18:40:26

Python3.3 中的 LRU 缓存的复杂度为 O(1)插入、删除和搜索。

该设计使用循环双向链接条目列表（按从旧到新的顺序排列）和哈希表来定位各个链接。缓存命中使用哈希表来查找相关链接并将其移动到链表的头部。缓存未命中会删除最旧的链接并在链表的头部创建一个新链接。

这是一个由 33 行非常基本的 Python 代码组成的简化（但快速）版本（仅使用简单的字典和列表操作）。它在 Python2.0 及更高版本（或 PyPy 或 Jython 或 Python3.x）上运行：

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        # Link structure: [PREV, NEXT, KEY, VALUE]
        self.root = [None, None, None, None]
        self.root[0] = self.root[1] = self.root
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

    def __call__(self, *key):
        mapping = self.mapping
        root = self.root
        link = mapping.get(key)
        if link is not None:
            link_prev, link_next, link_key, value = link
            link_prev[1] = link_next
            link_next[0] = link_prev
            last = root[0]
            last[1] = root[0] = link
            link[0] = last
            link[1] = root
            return value
        value = self.original_function(*key)
        if len(mapping) >= self.maxsize:
            oldest = root[1]
            next_oldest = oldest[1]
            root[1] = next_oldest
            next_oldest[0] = root
            del mapping[oldest[2]]
        last = root[0]
        last[1] = root[0] = mapping[key] = [last, root, key, value]
        return value


if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

从 Python 3.1 开始，OrderedDict 使得实现 LRU 缓存变得更加简单：

from collections import OrderedDict

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = OrderedDict()

    def __call__(self, *key):
        mapping = self.mapping
        try:
            value = mapping[key]
            mapping.move_to_end(key)
        except KeyError:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                mapping.popitem(False)
            mapping[key] = value
        return value

The LRU cache in Python3.3 has O(1) insertion, deletion, and search.

The design uses a circular doubly-linked list of entries (arranged oldest-to-newest) and a hash table to locate individual links. Cache hits use the hash table to find the relevant link and move it to the head of the list. Cache misses delete the oldest link and create a new link at the head of the linked list.

Here's a simplified (but fast) version in 33 lines of very basic Python (using only simple dictionary and list operations). It runs on Python2.0 and later (or PyPy or Jython or Python3.x):

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        # Link structure: [PREV, NEXT, KEY, VALUE]
        self.root = [None, None, None, None]
        self.root[0] = self.root[1] = self.root
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

    def __call__(self, *key):
        mapping = self.mapping
        root = self.root
        link = mapping.get(key)
        if link is not None:
            link_prev, link_next, link_key, value = link
            link_prev[1] = link_next
            link_next[0] = link_prev
            last = root[0]
            last[1] = root[0] = link
            link[0] = last
            link[1] = root
            return value
        value = self.original_function(*key)
        if len(mapping) >= self.maxsize:
            oldest = root[1]
            next_oldest = oldest[1]
            root[1] = next_oldest
            next_oldest[0] = root
            del mapping[oldest[2]]
        last = root[0]
        last[1] = root[0] = mapping[key] = [last, root, key, value]
        return value


if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

Starting in Python 3.1, OrderedDict makes it even simpler to implement a LRU cache:

from collections import OrderedDict

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = OrderedDict()

    def __call__(self, *key):
        mapping = self.mapping
        try:
            value = mapping[key]
            mapping.move_to_end(key)
        except KeyError:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                mapping.popitem(False)
            mapping[key] = value
        return value

回复收藏 0 原文

冷清清 2024-10-14 18:40:26

除了 Python 3.2 中包含的版本之外，Python Cookbook 中还有 LRU 缓存配方，包括这些由 Python 核心开发人员 Raymond Hettinger 编写。

回复收藏 0 原文

三寸金莲 2024-10-14 18:40:26

Python 3.2 functools 包含一个 LRU 缓存< /a>.您可以轻松地从 repo 中挑选它，检查您是否必须调整它以与 Python 2 配合使用（应该不会太难 - 也许使用 itertools 而不是某些内置函数 - 询问您是否需要帮助）并完成。不过，您需要将查询包装成可调用的，并确保它依赖于（可哈希的）函数参数。