如何记忆**kwargs？

发布于 2024-11-16 08:36:21 字数 926 浏览 4 评论 0原文

我还没有看到一种既定的方法来记忆采用关键字参数（即某种类型）的函数，

def f(*args, **kwargs)

因为记忆器通常有一个 dict 来缓存给定输入参数集的结果，并且 < code>kwargs 是一个 dict，因此不可散列。我尝试过，按照此处的讨论，使用

(args, frozenset(kwargs.items()))

as缓存 dict 的键，但这仅在 kwargs 中的值可散列的情况下才有效。此外，正如下面的答案所指出的那样，frozenset 不是一个有序的数据结构。因此，这个解决方案可能更安全：

(args, tuple(sorted(kwargs.items())))

但它仍然无法应对不可散列的元素。我看到的另一种方法是在缓存键中使用 kwargs 的 string 表示形式：

(args, str(sorted(kwargs.items())))

我看到的唯一缺点是散列可能很长的字符串的开销。据我所知，结果应该是正确的。谁能发现后一种方法有什么问题吗？下面的答案之一指出，这假定关键字参数值的 __str__ 或 __repr__ 函数具有某些行为。这看起来像是一场表演。

是否有另一种更成熟的方法来实现记忆化，可以处理 **kwargs 和不可散列的参数值？

原文

I haven't seen an established way to memoize a function that takes key-word arguments, i.e. something of type

def f(*args, **kwargs)

since typically a memoizer has a dict to cache results for a given set of input parameters, and kwargs is a dict and hence unhashable. I have tried, following discussions here, using

(args, frozenset(kwargs.items()))

as key to the cache dict, but this only works if the values in kwargs are hashable. Furthermore, as pointed out in answers below is that frozenset is not an ordered data structure. Therefore this solution might be safer:

(args, tuple(sorted(kwargs.items())))

But it still cannot cope with un-hashable elements. Another approach I have seen is to use a string representation of the kwargs in the cache key:

(args, str(sorted(kwargs.items())))

The only drawback I see with this is the overhead of hashing a potentially very long string. As far as I can see the results should be correct. Can anyone spot any problems with the latter approach? One of the answers below points out that this assumes certain behaviour of the __str__ or __repr__ functions for the values of the key-word arguments. This seems like a show-stopper.

Is there another, more established way of achieving memoization that can cope with **kwargs and un-hashable argument values?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如歌彻婉言 2024-11-23 08:36:21

key = (args, frozenset(kwargs.items()))

这是您在不对数据做出假设的情况下可以做到的“最佳”结果。

然而，想要对字典执行记忆似乎是可以想象的（虽然有点不寻常），如果你愿意的话，你可以特殊情况。例如，您可以在复制字典时递归应用frozenset(---.items())。

如果您进行排序，您可能会遇到一个糟糕的情况，即您的键无法排序。例如，“子集和相等比较不能推广到完整的排序函数。例如，任何两个不相交的集合不相等且不是彼此的子集，因此以下所有结果都返回 False：ab。因此，集合不会实现 cmp() 方法。”

>>> sorted([frozenset({1,2}), frozenset({1,3})])
[frozenset({1, 2}), frozenset({1, 3})]

>>> sorted([frozenset({1,3}), frozenset({1,2})]) # THE SAME
[frozenset({1, 3}), frozenset({1, 2})] # DIFFERENT SORT RESULT

# sorted(stuff) != sorted(reversed(stuff)), if not strictly totally ordered

编辑： Ignacio 说“虽然你可以不使用在任意字典上进行排序（），kwargs 将具有 str 键。”这是完全正确的。因此，这对于键来说不是问题，但如果您（或不太可能的代表）以某种方式依赖于排序，则可能需要记住值的问题。

关于使用str：

在这种情况下，大多数数据都可以很好地工作，但对手（例如在安全漏洞上下文中）可能会制造冲突。请注意，这并不容易，因为大多数默认的repr都使用了大量良好的分组和转义。事实上我无法找到这样的碰撞。但是，草率的第三方或不完整的 repr 实现也是可能的。

另请考虑以下事项：如果您要存储诸如 ((

,), freezeset(...))

和 ((,) 这样的键， ,), freezeset(...))，你的缓存将无限增长通过调用相同的项目。（您也许可以使用正则表达式来解决这个问题......）并且尝试使用生成器会弄乱您正在使用的函数的语义。不过，如果您希望记住 is 样式的相等性而不是 == 样式的相等性，这可能是理想的行为。

另外，在解释器中执行类似 str({1:object()}) 的操作将每次在内存中的同一位置返回一个对象！我认为这就是垃圾收集器在工作。这将是灾难性的，因为如果您碰巧对进行散列，并且您碰巧在同一内存位置创建了相同类型的对象稍后（由于垃圾收集），您将从记忆函数中得到不正确的结果。如前所述，一种可能非常黑客的解决方法是使用正则表达式检测此类对象。

key = (args, frozenset(kwargs.items()))

This is the "best" you can do without making assumptions about your data.

However it seems conceivable to want to perform memoization on dictionaries (a bit unusual though), you could special-case that if you desired it. For example you could recursively apply frozenset(---.items()) while copying dictionaries.

If you do sorted, you could be in a bad situation where you have unorderable keys. For example, "The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b. Accordingly, sets do not implement the cmp() method."

>>> sorted([frozenset({1,2}), frozenset({1,3})])
[frozenset({1, 2}), frozenset({1, 3})]

>>> sorted([frozenset({1,3}), frozenset({1,2})]) # THE SAME
[frozenset({1, 3}), frozenset({1, 2})] # DIFFERENT SORT RESULT

# sorted(stuff) != sorted(reversed(stuff)), if not strictly totally ordered

edit: Ignacio says "While you can't use sorted() on arbitrary dicts, kwargs will have str keys." This is entirely correct. Thus this is not an issue for keys, though possibly something to keep in mind for values if you (or unlikely repr) are relying on sorting somehow.

Regarding using str:

It is the case most data will work nicely, but it is possible for an adversary (e.g. in a security-vulnerability context) to craft a collision. It's not easy mind you because most default reprs use lots of good grouping and escape. In fact I was not able to find such a collision. But it is possible with sloppy third-party or incomplete repr implementations.

Also consider the following: If you are storing keys like ((<map object at 0x1377d50>,), frozenset(...)) and ((<list_iterator object at 0x1377dd0>,<list_iterator object at 0x1377dd0>), frozenset(...)), your cache will grow unboundedly just by calling the same items. (You could perhaps work around this issue with a regex...) And attempting to consume the generators will mess up the semantics of the function you're using. This may be desired behavior though if you wish to memoize on is-style equality rather than ==-style equality.

Also doing something like str({1:object()}) in the interpreter will return an object at the same location in memory each time! I think this is the garbage collector at work. This would be disastrous, because if you happen to be hashing <some object at 0x???????> and you happen to create an object of the same type at the same memory location later on (due to garbage collection), you will get incorrect results from the memoized function. As mentioned, one possibly really hackish workaround is to detect such objects with a regex.

回复收藏 0 原文

み格子的夏天 2024-11-23 08:36:21

字典可以是任意顺序，因此不能保证后者能够工作。使用 sorted(kwargs.items()) 首先按键排序。

回复收藏 0 原文

白首有我共你 2024-11-23 08:36:21

与EMS所说的类似，但最好的方法是：

key = cPickle.dumps((*args, **kwargs))

我已经对装饰器的记忆进行了大量的研究和测试，这是迄今为止我发现的最好的方法。

It's similar to what EMS said, but the best way would be:

key = cPickle.dumps((*args, **kwargs))

I've been doing a lot of research and testing for memorization with decorators, and this is the best method I've found so far.

回复收藏 0 原文

椵侞 2024-11-23 08:36:21

这里：

from functools import wraps

def memoize(fun):
    """A simple memoize decorator for functions supporting positional args."""
    @wraps(fun)
    def wrapper(*args, **kwargs):
        key = (args, frozenset(sorted(kwargs.items())))
        try:
            return cache[key]
        except KeyError:
            ret = cache[key] = fun(*args, **kwargs)
        return ret
    cache = {}
    return wrapper

测试：

import unittest

class TestMemoize(unittest.TestCase):
    def test_it(self):
        @memoize
        def foo(*args, **kwargs):
            "foo docstring"
            calls.append(None)
            return (args, kwargs)

        calls = []
        # no args
        for x in range(2):
            ret = foo()
            expected = ((), {})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 1)
        # with args
        for x in range(2):
            ret = foo(1)
            expected = ((1, ), {})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 2)
        # with args + kwargs
        for x in range(2):
            ret = foo(1, bar=2)
            expected = ((1, ), {'bar': 2})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 3)
        self.assertEqual(foo.__doc__, "foo docstring")

unittest.main()

只要您不传递不可散列的类型（例如 dict）作为参数，这就有效。
我没有解决方案，但 collections.lru_cache() 实现可能有。
请参阅此处的 _make_key() 函数：
http://code.activestate.com/recipes/578078/

Here:

from functools import wraps

def memoize(fun):
    """A simple memoize decorator for functions supporting positional args."""
    @wraps(fun)
    def wrapper(*args, **kwargs):
        key = (args, frozenset(sorted(kwargs.items())))
        try:
            return cache[key]
        except KeyError:
            ret = cache[key] = fun(*args, **kwargs)
        return ret
    cache = {}
    return wrapper

Tests:

import unittest

class TestMemoize(unittest.TestCase):
    def test_it(self):
        @memoize
        def foo(*args, **kwargs):
            "foo docstring"
            calls.append(None)
            return (args, kwargs)

        calls = []
        # no args
        for x in range(2):
            ret = foo()
            expected = ((), {})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 1)
        # with args
        for x in range(2):
            ret = foo(1)
            expected = ((1, ), {})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 2)
        # with args + kwargs
        for x in range(2):
            ret = foo(1, bar=2)
            expected = ((1, ), {'bar': 2})
            self.assertEqual(ret, expected)
            self.assertEqual(len(calls), 3)
        self.assertEqual(foo.__doc__, "foo docstring")

unittest.main()

This works as long as you don't pass an unhashable type (e.g. dict) as argument.
I don't have a solution for that but collections.lru_cache() implementation might have.
See _make_key() function here:
http://code.activestate.com/recipes/578078/

回复收藏 0 原文