如何在 Python 中创建有界记忆装饰器？

发布于 2025-01-08 02:03:21 字数 2534 浏览 5 评论 0原文

显然，快速搜索会产生 Python 中记忆化装饰器的一百万种实现和风格。然而，我对一种我一直找不到的味道感兴趣。我希望存储值的缓存可以具有固定的容量。当添加新元素时，如果达到容量，则删除最旧的值并用最新的值替换。

我担心的是，如果我使用记忆化来存储大量元素，那么程序会因为内存不足而崩溃。（我不知道这个问题在实践中是否合理。）如果缓存的大小是固定的，那么内存错误就不会成为问题。当程序执行时，我处理的许多问题都会发生变化，因此初始缓存值看起来与后来的缓存值非常不同（并且以后重现的可能性要小得多）。这就是为什么我希望用最新的东西取代最旧的东西。

我找到了 OrderedDict 类和一个示例，展示了如何对其进行子类化以指定最大大小。我想使用它作为我的缓存，而不是普通的 dict。问题是，我需要 memoize 装饰器采用一个名为 maxlen 的参数，该参数默认为 None。如果为None，则缓存是无限的并且正常运行。任何其他值都用作缓存的大小。

我希望它像下面这样工作：

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

下面

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

是我到目前为止的代码，但我不知道如何将参数传递到装饰器中，同时使其“裸”工作和使用参数工作。

import collections
import functools

class BoundedOrderedDict(collections.OrderedDict):
    def __init__(self, *args, **kwds):
        self.maxlen = kwds.pop("maxlen", None)
        collections.OrderedDict.__init__(self, *args, **kwds)
        self._checklen()

    def __setitem__(self, key, value):
        collections.OrderedDict.__setitem__(self, key, value)
        self._checklen()

    def _checklen(self):
        if self.maxlen is not None:
            while len(self) > self.maxlen:
                self.popitem(last=False)

def memoize(function):
    cache = BoundedOrderedDict()  # I want this to take maxlen as an argument
    @functools.wraps(function)
    def memo_target(*args):
        lookup_value = args
        if lookup_value not in cache:
            cache[lookup_value] = function(*args)
        return cache[lookup_value]
    return memo_target

@memoize
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

if __name__ == '__main__':
    x = fib(50)
    print(x)

编辑：根据本的建议，我创建了以下装饰器，我相信它按照我想象的方式工作。对我来说，能够在多处理中使用这些修饰函数非常重要，这在过去一直是一个问题。但对该代码的快速测试似乎工作正常，即使将作业分包给线程池也是如此。

def memoize(func=None, maxlen=None):
    if func:
        cache = BoundedOrderedDict(maxlen=maxlen)
        @functools.wraps(func)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = func(*args)
            return cache[lookup_value]
        return memo_target
    else:
        def memoize_factory(func):
            return memoize(func, maxlen=maxlen)
        return memoize_factory

原文

Obviously, a quick search yields a million implementations and flavors of the memoization decorator in Python. However, I am interested in a flavor that I haven't been able to find. I would like to have it such that the cache of stored values can be of a fixed capacity. When new elements are added, if the capacity is reached, then the oldest value is removed and is replaced with the newest value.

My concern is that, if I use memoization to store a great many elements, then the program will crash because of a lack of memory. (I don't know how well-placed this concern may be in practice.) If the cache were of a fixed size, then a memory error would not be an issue. And many problems that I work on change as the program executes so that initial cached values would look very different from later cached values (and would be much less likely to recur later). That's why I'd like the oldest stuff to be replaced by the newest stuff.

I found the OrderedDict class and an example showing how to subclass it to specify a maximum size. I'd like to use that as my cache, rather than a normal dict. The problem is, I need the memoize decorator to take a parameter called maxlen that defaults to None. If it is None, then the cache is boundless and operates as normal. Any other value is used as the size for the cache.

I want it to work like the following:

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

and

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Below is the code that I have so far, but I don't see how to pass the parameter into the decorator while making it work both "naked" and with a parameter.

import collections
import functools

class BoundedOrderedDict(collections.OrderedDict):
    def __init__(self, *args, **kwds):
        self.maxlen = kwds.pop("maxlen", None)
        collections.OrderedDict.__init__(self, *args, **kwds)
        self._checklen()

    def __setitem__(self, key, value):
        collections.OrderedDict.__setitem__(self, key, value)
        self._checklen()

    def _checklen(self):
        if self.maxlen is not None:
            while len(self) > self.maxlen:
                self.popitem(last=False)

def memoize(function):
    cache = BoundedOrderedDict()  # I want this to take maxlen as an argument
    @functools.wraps(function)
    def memo_target(*args):
        lookup_value = args
        if lookup_value not in cache:
            cache[lookup_value] = function(*args)
        return cache[lookup_value]
    return memo_target

@memoize
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

if __name__ == '__main__':
    x = fib(50)
    print(x)

Edit: Using Ben's suggestion, I created the following decorator, which I believe works the way I imagined. It's important to me to be able to use these decorated functions with multiprocessing, and that has been an issue in the past. But a quick test of this code seemed to work correctly, even when farming out the jobs to a pool of threads.

def memoize(func=None, maxlen=None):
    if func:
        cache = BoundedOrderedDict(maxlen=maxlen)
        @functools.wraps(func)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = func(*args)
            return cache[lookup_value]
        return memo_target
    else:
        def memoize_factory(func):
            return memoize(func, maxlen=maxlen)
        return memoize_factory

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

墨离汐 2025-01-15 02:03:21

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

这里 memoize 用作在单个函数参数上调用的函数，并返回一个函数。 memoize 是一个装饰器。

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

这里 memoize 用作在单个整数参数上调用并返回一个函数的函数，并且返回的函数本身用作装饰器，即它在单个函数参数上调用并返回一个函数。 memoize 是一个装饰器工厂。

因此，为了统一这两者，您将不得不编写一些丑陋的代码。我可能会这样做的方式是让 memoize 看起来像这样：

def memoize(func=None, maxlen=None):
    if func:
        # act as decorator
    else:
        # act as decorator factory

这样，如果您想传递参数，您总是将它们作为关键字参数传递，而留下 func （应该是一个位置参数）未设置，如果您只想将所有内容设为默认值，它将神奇地直接用作装饰器。这确实意味着 @memoize(200) 会给你一个错误；您可以通过进行一些类型检查来查看 func 是否可调用来避免这种情况，这在实践中应该运行良好，但实际上并不是很“Pythonic”。

另一种方法是使用两个不同的装饰器，例如 memoize 和 bounded_memoize。无界 memoize 可以通过调用 bounded_memoize 并将 maxlen 设置为 None 来实现，因此它不会在实施或维护方面不会花费您任何费用。

通常，根据经验，我会尽量避免修改函数来实现两个仅切线相关的功能集，尤其当它们具有如此不同的签名时。但在这种情况下，它确实使装饰器的使用变得自然（要求 @memoize() 会很容易出错，尽管从理论角度来看它更加一致），并且您可能会实现一次并多次使用它，因此使用时的可读性可能是更重要的问题。

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

Here memoize is used as a function that is called on a single function argument, and returns a function. memoize is a decorator.

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Here memoize is used as a function that is called on a single integer argument and returns a function, and that returned function is itself used as a decorator i.e. it is called on a single function argument and returns a function. memoize is a decorator factory.

So to unify these two, you're going to have to write some ugly code. The way I would probably do it is to have memoize look like this:

def memoize(func=None, maxlen=None):
    if func:
        # act as decorator
    else:
        # act as decorator factory

This way if you want to pass parameters you always pass them as keyword arguments, leaving func (which should be a positional parameter) unset, and if you just want everything to default it will magically work as a decorator directly. This does mean @memoize(200) will give you an error; you could avoid that by instead doing some type checking to see whether func is callable, which should work well in practice but isn't really very "pythonic".

An alternative would be to have two different decorators, say memoize and bounded_memoize. The unbounded memoize can have a trivial implementation by just calling bounded_memoize with maxlen set to None, so it doesn't cost you anything in implementation or maintenance.

Normally as a rule of thumb I try to avoid mangling a function to implement two only-tangentially related sets of functionality, especially when they have such different signatures. But in this case it does make the use of the decorator is natural (requiring @memoize() would be quite error prone, even though it's more consistent from a theoretical perspective), and you're presumably going to implement this once and use it many times, so readibility at point of use is probably the more important concern.

回复收藏 0 原文

最终幸福 2025-01-15 02:03:21

您想要编写一个装饰器，它接受一个参数（BoundedOrderedDict 的最大长度）并返回一个装饰器，该装饰器将使用适当大小的 BoundedOrderedDict 来记忆您的函数：

def boundedMemoize(maxCacheLen):
    def memoize(function):
        cache = BoundedOrderedDict(maxlen = maxCacheLen)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = function(*args)
            return cache[lookup_value]
        return memo_target
    return memoize

您可以像这样使用它：

@boundedMemoize(100)
def fib(n):
    if n < 2: return 1
    return fib(n - 1) + fib(n - 2)

编辑：哎呀，错过了问题的一部分。如果您希望装饰器的 maxlen 参数是可选的，您可以这样做：

def boundedMemoize(arg):
    if callable(arg):
        cache = BoundedOrderedDict()
        @functools.wraps(arg)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = arg(*args)
            return cache[lookup_value]
        return memo_target

    if isinstance(arg, int):
        def memoize(function):
            cache = BoundedOrderedDict(maxlen = arg)
            @functools.wraps(function)
            def memo_target(*args):
                lookup_value = args
                if lookup_value not in cache:
                    cache[lookup_value] = function(*args)
                return cache[lookup_value]
            return memo_target
        return memoize

You want write a decorator that takes an argument (the maximum length of the BoundedOrderedDict) and returns a decorator that will memoize your function with a BoundedOrderedDict of the appropriate size:

def boundedMemoize(maxCacheLen):
    def memoize(function):
        cache = BoundedOrderedDict(maxlen = maxCacheLen)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = function(*args)
            return cache[lookup_value]
        return memo_target
    return memoize

You can use it like this:

@boundedMemoize(100)
def fib(n):
    if n < 2: return 1
    return fib(n - 1) + fib(n - 2)

Edit: Whoops, missed part of the question. If you want the maxlen argument to the decorator to be optional, you could do something like this:

def boundedMemoize(arg):
    if callable(arg):
        cache = BoundedOrderedDict()
        @functools.wraps(arg)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = arg(*args)
            return cache[lookup_value]
        return memo_target

    if isinstance(arg, int):
        def memoize(function):
            cache = BoundedOrderedDict(maxlen = arg)
            @functools.wraps(function)
            def memo_target(*args):
                lookup_value = args
                if lookup_value not in cache:
                    cache[lookup_value] = function(*args)
                return cache[lookup_value]
            return memo_target
        return memoize

回复收藏 0 原文

苍景流年 2025-01-15 02:03:21

来自 http://www.python.org/dev/peps/pep-0318/< /a>

当前语法还允许装饰器声明调用返回装饰器的函数：

@decomaker(argA, argB, ...)
def func(arg1, arg2, ...):
    pass

这相当于：

func = decomaker(argA, argB, ...)(func)

另外，我不确定是否会为此使用 OrderedDict，我会使用环形缓冲区，它们非常简单来实施。

From http://www.python.org/dev/peps/pep-0318/

The current syntax also allows decorator declarations to call a function that returns a decorator:

@decomaker(argA, argB, ...)
def func(arg1, arg2, ...):
    pass

This is equivalent to:

func = decomaker(argA, argB, ...)(func)

Also, I'm not sure if I would use OrderedDict for this, I would use a Ring Buffer, they are very easy to implement.

回复收藏 0 原文

~没有更多了~

关于作者

恋你朝朝暮暮

暂无简介

文章

506 人气

关注发私信

友情链接

文江博客

如何在 Python 中创建有界记忆装饰器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如何在 Python 中创建有界记忆装饰器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。