如何在 Python 中创建有界记忆装饰器?
显然,快速搜索会产生 Python 中记忆化装饰器的一百万种实现和风格。然而,我对一种我一直找不到的味道感兴趣。我希望存储值的缓存可以具有固定的容量。当添加新元素时,如果达到容量,则删除最旧的值并用最新的值替换。
我担心的是,如果我使用记忆化来存储大量元素,那么程序会因为内存不足而崩溃。 (我不知道这个问题在实践中是否合理。)如果缓存的大小是固定的,那么内存错误就不会成为问题。当程序执行时,我处理的许多问题都会发生变化,因此初始缓存值看起来与后来的缓存值非常不同(并且以后重现的可能性要小得多)。这就是为什么我希望用最新的东西取代最旧的东西。
我找到了 OrderedDict 类和一个示例,展示了如何对其进行子类化以指定最大大小。我想使用它作为我的缓存,而不是普通的 dict
。问题是,我需要 memoize 装饰器采用一个名为 maxlen
的参数,该参数默认为 None
。如果为None
,则缓存是无限的并且正常运行。任何其他值都用作缓存的大小。
我希望它像下面这样工作:
@memoize
def some_function(spam, eggs):
# This would use the boundless cache.
pass
下面
@memoize(200) # or @memoize(maxlen=200)
def some_function(spam, eggs):
# This would use the bounded cache of size 200.
pass
是我到目前为止的代码,但我不知道如何将参数传递到装饰器中,同时使其“裸”工作和使用参数工作。
import collections
import functools
class BoundedOrderedDict(collections.OrderedDict):
def __init__(self, *args, **kwds):
self.maxlen = kwds.pop("maxlen", None)
collections.OrderedDict.__init__(self, *args, **kwds)
self._checklen()
def __setitem__(self, key, value):
collections.OrderedDict.__setitem__(self, key, value)
self._checklen()
def _checklen(self):
if self.maxlen is not None:
while len(self) > self.maxlen:
self.popitem(last=False)
def memoize(function):
cache = BoundedOrderedDict() # I want this to take maxlen as an argument
@functools.wraps(function)
def memo_target(*args):
lookup_value = args
if lookup_value not in cache:
cache[lookup_value] = function(*args)
return cache[lookup_value]
return memo_target
@memoize
def fib(n):
if n < 2: return 1
return fib(n-1) + fib(n-2)
if __name__ == '__main__':
x = fib(50)
print(x)
编辑:根据本的建议,我创建了以下装饰器,我相信它按照我想象的方式工作。对我来说,能够在多处理中使用这些修饰函数非常重要,这在过去一直是一个问题。但对该代码的快速测试似乎工作正常,即使将作业分包给线程池也是如此。
def memoize(func=None, maxlen=None):
if func:
cache = BoundedOrderedDict(maxlen=maxlen)
@functools.wraps(func)
def memo_target(*args):
lookup_value = args
if lookup_value not in cache:
cache[lookup_value] = func(*args)
return cache[lookup_value]
return memo_target
else:
def memoize_factory(func):
return memoize(func, maxlen=maxlen)
return memoize_factory
Obviously, a quick search yields a million implementations and flavors of the memoization decorator in Python. However, I am interested in a flavor that I haven't been able to find. I would like to have it such that the cache of stored values can be of a fixed capacity. When new elements are added, if the capacity is reached, then the oldest value is removed and is replaced with the newest value.
My concern is that, if I use memoization to store a great many elements, then the program will crash because of a lack of memory. (I don't know how well-placed this concern may be in practice.) If the cache were of a fixed size, then a memory error would not be an issue. And many problems that I work on change as the program executes so that initial cached values would look very different from later cached values (and would be much less likely to recur later). That's why I'd like the oldest stuff to be replaced by the newest stuff.
I found the OrderedDict
class and an example showing how to subclass it to specify a maximum size. I'd like to use that as my cache, rather than a normal dict
. The problem is, I need the memoize decorator to take a parameter called maxlen
that defaults to None
. If it is None
, then the cache is boundless and operates as normal. Any other value is used as the size for the cache.
I want it to work like the following:
@memoize
def some_function(spam, eggs):
# This would use the boundless cache.
pass
and
@memoize(200) # or @memoize(maxlen=200)
def some_function(spam, eggs):
# This would use the bounded cache of size 200.
pass
Below is the code that I have so far, but I don't see how to pass the parameter into the decorator while making it work both "naked" and with a parameter.
import collections
import functools
class BoundedOrderedDict(collections.OrderedDict):
def __init__(self, *args, **kwds):
self.maxlen = kwds.pop("maxlen", None)
collections.OrderedDict.__init__(self, *args, **kwds)
self._checklen()
def __setitem__(self, key, value):
collections.OrderedDict.__setitem__(self, key, value)
self._checklen()
def _checklen(self):
if self.maxlen is not None:
while len(self) > self.maxlen:
self.popitem(last=False)
def memoize(function):
cache = BoundedOrderedDict() # I want this to take maxlen as an argument
@functools.wraps(function)
def memo_target(*args):
lookup_value = args
if lookup_value not in cache:
cache[lookup_value] = function(*args)
return cache[lookup_value]
return memo_target
@memoize
def fib(n):
if n < 2: return 1
return fib(n-1) + fib(n-2)
if __name__ == '__main__':
x = fib(50)
print(x)
Edit: Using Ben's suggestion, I created the following decorator, which I believe works the way I imagined. It's important to me to be able to use these decorated functions with multiprocessing
, and that has been an issue in the past. But a quick test of this code seemed to work correctly, even when farming out the jobs to a pool of threads.
def memoize(func=None, maxlen=None):
if func:
cache = BoundedOrderedDict(maxlen=maxlen)
@functools.wraps(func)
def memo_target(*args):
lookup_value = args
if lookup_value not in cache:
cache[lookup_value] = func(*args)
return cache[lookup_value]
return memo_target
else:
def memoize_factory(func):
return memoize(func, maxlen=maxlen)
return memoize_factory
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里
memoize
用作在单个函数参数上调用的函数,并返回一个函数。memoize
是一个装饰器。这里 memoize 用作在单个整数参数上调用并返回一个函数的函数,并且返回的函数本身用作装饰器,即它在单个函数参数上调用并返回一个函数。
memoize
是一个装饰器工厂。因此,为了统一这两者,您将不得不编写一些丑陋的代码。我可能会这样做的方式是让
memoize
看起来像这样:这样,如果您想传递参数,您总是将它们作为关键字参数传递,而留下
func
(应该是一个位置参数)未设置,如果您只想将所有内容设为默认值,它将神奇地直接用作装饰器。这确实意味着@memoize(200)
会给你一个错误;您可以通过进行一些类型检查来查看 func 是否可调用来避免这种情况,这在实践中应该运行良好,但实际上并不是很“Pythonic”。另一种方法是使用两个不同的装饰器,例如
memoize
和bounded_memoize
。无界memoize
可以通过调用bounded_memoize
并将maxlen
设置为None
来实现,因此它不会在实施或维护方面不会花费您任何费用。通常,根据经验,我会尽量避免修改函数来实现两个仅切线相关的功能集,尤其当它们具有如此不同的签名时。但在这种情况下,它确实使装饰器的使用变得自然(要求 @memoize() 会很容易出错,尽管从理论角度来看它更加一致) ,并且您可能会实现一次并多次使用它,因此使用时的可读性可能是更重要的问题。
Here
memoize
is used as a function that is called on a single function argument, and returns a function.memoize
is a decorator.Here
memoize
is used as a function that is called on a single integer argument and returns a function, and that returned function is itself used as a decorator i.e. it is called on a single function argument and returns a function.memoize
is a decorator factory.So to unify these two, you're going to have to write some ugly code. The way I would probably do it is to have
memoize
look like this:This way if you want to pass parameters you always pass them as keyword arguments, leaving
func
(which should be a positional parameter) unset, and if you just want everything to default it will magically work as a decorator directly. This does mean@memoize(200)
will give you an error; you could avoid that by instead doing some type checking to see whetherfunc
is callable, which should work well in practice but isn't really very "pythonic".An alternative would be to have two different decorators, say
memoize
andbounded_memoize
. The unboundedmemoize
can have a trivial implementation by just callingbounded_memoize
withmaxlen
set toNone
, so it doesn't cost you anything in implementation or maintenance.Normally as a rule of thumb I try to avoid mangling a function to implement two only-tangentially related sets of functionality, especially when they have such different signatures. But in this case it does make the use of the decorator is natural (requiring
@memoize()
would be quite error prone, even though it's more consistent from a theoretical perspective), and you're presumably going to implement this once and use it many times, so readibility at point of use is probably the more important concern.您想要编写一个装饰器,它接受一个参数(
BoundedOrderedDict
的最大长度)并返回一个装饰器,该装饰器将使用适当大小的BoundedOrderedDict
来记忆您的函数:您可以像这样使用它:
编辑:哎呀,错过了问题的一部分。如果您希望装饰器的 maxlen 参数是可选的,您可以这样做:
You want write a decorator that takes an argument (the maximum length of the
BoundedOrderedDict
) and returns a decorator that will memoize your function with aBoundedOrderedDict
of the appropriate size:You can use it like this:
Edit: Whoops, missed part of the question. If you want the maxlen argument to the decorator to be optional, you could do something like this:
来自 http://www.python.org/dev/peps/pep-0318/< /a>
当前语法还允许装饰器声明调用返回装饰器的函数:
这相当于:
另外,我不确定是否会为此使用 OrderedDict,我会使用环形缓冲区,它们非常简单来实施。
From http://www.python.org/dev/peps/pep-0318/
The current syntax also allows decorator declarations to call a function that returns a decorator:
This is equivalent to:
Also, I'm not sure if I would use OrderedDict for this, I would use a Ring Buffer, they are very easy to implement.