如何编写通用的 memoize 函数?
我正在编写一个函数来查找 三角形数字 以及自然的方法递归地写:
function triangle (x)
if x == 0 then return 0 end
return x+triangle(x-1)
end
但是尝试计算前 100,000 个三角形数会失败,并在一段时间后出现堆栈溢出。 这是 memoize 的理想函数,但我想要一个能够记住我传递给它的任何函数的解决方案。
I'm writing a function to find triangle numbers and the natural way to write it is recursively:
function triangle (x)
if x == 0 then return 0 end
return x+triangle(x-1)
end
But attempting to calculate the first 100,000 triangle numbers fails with a stack overflow after a while. This is an ideal function to memoize, but I want a solution that will memoize any function I pass to it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
Mathematica 有一种特别巧妙的记忆方法,它依赖于散列和函数调用使用相同语法的事实:
就是这样。 它之所以有效,是因为模式匹配函数调用的规则是这样的:它总是在更通用的定义之前使用更具体的定义。
当然,正如已经指出的,这个例子有一个封闭形式的解决方案:triangle[x_] := x*(x+1)/2。 斐波那契数是添加记忆如何带来巨大加速的经典示例:
尽管它也有一个封闭形式的等价物,尽管更混乱: http://mathworld.wolfram.com/FibonacciNumber.html
我不同意有人认为这不适合记忆,因为你可以“只使用循环”。 记忆的要点是任何重复函数调用都是 O(1) 时间。 这比 O(n) 好很多。 事实上,您甚至可以编造一个场景,其中记忆实现比封闭形式实现具有更好的性能!
Mathematica has a particularly slick way to do memoization, relying on the fact that hashes and function calls use the same syntax:
That's it. It works because the rules for pattern-matching function calls are such that it always uses a more specific definition before a more general definition.
Of course, as has been pointed out, this example has a closed-form solution:
triangle[x_] := x*(x+1)/2
. Fibonacci numbers are the classic example of how adding memoization gives a drastic speedup:Although that too has a closed-form equivalent, albeit messier: http://mathworld.wolfram.com/FibonacciNumber.html
I disagree with the person who suggested this was inappropriate for memoization because you could "just use a loop". The point of memoization is that any repeat function calls are O(1) time. That's a lot better than O(n). In fact, you could even concoct a scenario where the memoized implementation has better performance than the closed-form implementation!
您还为原来的问题提出了错误的问题;)
对于这种情况,这是一个更好的方法:
triangle(n) = n * (n - 1) / 2
此外,假设公式没有如此简洁的解决方案,记忆在这里仍然是一个糟糕的方法。 在这种情况下,您最好只编写一个简单的循环。 请参阅此答案进行更全面的讨论。
You're also asking the wrong question for your original problem ;)
This is a better way for that case:
triangle(n) = n * (n - 1) / 2
Furthermore, supposing the formula didn't have such a neat solution, memoisation would still be a poor approach here. You'd be better off just writing a simple loop in this case. See this answer for a fuller discussion.
我打赌这样的东西应该适用于 Lua 中的变量参数列表:
您可能还可以使用 __tostring 对元表进行一些巧妙的处理,以便可以使用 tostring() 来转换参数列表。 哦,可能性。
I bet something like this should work with variable argument lists in Lua:
You could probably also do something clever with a metatables with __tostring so that the argument list could just be converted with a tostring(). Oh the possibilities.
在 C# 3.0 中 - 对于递归函数,您可以执行以下操作:
然后您可以创建一个记忆斐波那契函数,如下所示:
In C# 3.0 - for recursive functions, you can do something like:
Then you can create a memoized Fibonacci function like this:
在 Scala 中(未经测试):
请注意,这仅适用于 arity 1 的函数,但通过柯里化,您可以使其工作。 更微妙的问题是对于任何函数
f
来说memoize(f) != memoize(f)
。 解决这个问题的一种非常偷偷摸摸的方法如下:我认为这不会编译,但它确实说明了这个想法。
In Scala (untested):
Note that this only works for functions of arity 1, but with currying you could make it work. The more subtle problem is that
memoize(f) != memoize(f)
for any functionf
. One very sneaky way to fix this would be something like the following:I don't think that this will compile, but it does illustrate the idea.
更新:评论者指出记忆化是优化递归的好方法。 诚然,我以前没有考虑过这一点,因为我通常使用一种语言 (C#),在这种语言中,构建通用记忆化并不是那么简单。 请对下面的帖子持保留态度。
我认为 Luke 可能有最合适的解决方案这个问题,但记忆通常不能解决任何堆栈溢出问题。
堆栈溢出通常是由于递归深度超过平台可以处理的深度而引起的。 语言有时支持“尾递归”,它重新使用当前调用的上下文,而不是而不是为递归调用创建新的上下文。 但很多主流语言/平台不支持这一点。 例如,C# 本身就没有对尾递归的支持。 .NET JITter 的 64 位版本可以将其应用为 IL 级别的优化,如果您需要支持 32 位平台,这几乎毫无用处。
如果您的语言不支持尾递归,那么避免堆栈溢出的最佳选择是转换为显式循环(不太优雅,但有时是必要的),或者找到一种非迭代算法,例如 Luke 为这个问题提供的算法。
Update: Commenters have pointed out that memoization is a good way to optimize recursion. Admittedly, I hadn't considered this before, since I generally work in a language (C#) where generalized memoization isn't so trivial to build. Take the post below with that grain of salt in mind.
I think Luke likely has the most appropriate solution to this problem, but memoization is not generally the solution to any issue of stack overflow.
Stack overflow usually is caused by recursion going deeper than the platform can handle. Languages sometimes support "tail recursion", which re-uses the context of the current call, rather than creating a new context for the recursive call. But a lot of mainstream languages/platforms don't support this. C# has no inherent support for tail-recursion, for example. The 64-bit version of the .NET JITter can apply it as an optimization at the IL level, which is all but useless if you need to support 32-bit platforms.
If your language doesn't support tail recursion, your best option for avoiding stack overflows is either to convert to an explicit loop (much less elegant, but sometimes necessary), or find a non-iterative algorithm such as Luke provided for this problem.
请注意,为了避免堆栈溢出,三角形仍然需要播种。
Note that to avoid a stack overflow, triangle would still need to be seeded.
这是无需将参数转换为字符串即可工作的方法。
唯一需要注意的是它不能处理 nil 参数。 但接受的解决方案无法区分值
nil
和字符串"nil"
,所以这可能没问题。Here's something that works without converting the arguments to strings.
The only caveat is that it can't handle a nil argument. But the accepted solution can't distinguish the value
nil
from the string"nil"
, so that's probably OK.我受到这个问题的启发,在 Lua 中实现(又一个)灵活的 memoize 函数。
https://github.com/kikito/memoize.lua
主要优点:
将代码粘贴到此处作为参考:
I've been inspired by this question to implement (yet another) flexible memoize function in Lua.
https://github.com/kikito/memoize.lua
Main advantages:
Pasting the code here as reference:
这是一个通用的 C# 3.0 实现,如果它可以帮助的话:(
引用自 法语博客文章)
Here is a generic C# 3.0 implementation, if it could help :
(Quoted from a french blog article)
本着用不同语言发布记忆的方式,我想用一个不改变语言的 C++ 示例来回复 @onebyone.livejournal.com。
首先,用于单个参数函数的记忆器:
只需创建记忆器的实例,为其提供函数和参数。 只需确保不要在两个不同的函数之间共享相同的备忘录(但您可以在同一函数的不同实现之间共享它)。
接下来是驱动程序函数和实现。 只有驱动程序功能需要公开
int fib(int); // 司机
int fib_(int); // 实现
已实现:
以及驱动程序,用于
在 codepad.org 上记住 显示输出的永久链接。 测量调用次数以验证正确性。 (在此处插入单元测试...)
这仅记住一个输入函数。 对多个参数或不同参数的概括留给读者作为练习。
In the vein of posting memoization in different languages, i'd like to respond to @onebyone.livejournal.com with a non-language-changing C++ example.
First, a memoizer for single arg functions:
Just create an instance of the memoizer, feed it your function and argument. Just make sure not to share the same memo between two different functions (but you can share it between different implementations of the same function).
Next, a driver functon, and an implementation. only the driver function need be public
int fib(int); // driver
int fib_(int); // implementation
Implemented:
And the driver, to memoize
Permalink showing output on codepad.org. Number of calls is measured to verify correctness. (insert unit test here...)
This only memoizes one input functions. Generalizing for multiple args or varying arguments left as an exercise for the reader.
在 Perl 中,通用记忆很容易实现。 Memoize 模块是 Perl 核心的一部分,高度可靠、灵活且易于使用。
其手册页中的示例:
您可以在运行时添加、删除和自定义函数的记忆!您可以为自定义记忆计算提供回调。
Memoize.pm 甚至具有使备忘录缓存持久化的功能,因此不需要在每次调用程序时重新填充!
这是文档:http://perldoc.perl.org/5.8.8/Memoize。 html
In Perl generic memoization is easy to get. The Memoize module is part of the perl core and is highly reliable, flexible, and easy-to-use.
The example from it's manpage:
You can add, remove, and customize memoization of functions at run time! You can provide callbacks for custom memento computation.
Memoize.pm even has facilities for making the memento cache persistent, so it does not need to be re-filled on each invocation of your program!
Here's the documentation: http://perldoc.perl.org/5.8.8/Memoize.html
扩展这个想法,还可以使用两个输入参数来记忆函数:
请注意,参数顺序在缓存算法中很重要,因此,如果参数顺序在要记忆的函数中不重要,则获得缓存命中的几率将增加在检查缓存之前对参数进行排序。
但重要的是要注意,某些功能无法通过记忆来获利。 我写了 memoize2 来看看递归欧几里得算法是否可以找到最大公约数可以加快。
事实证明,gcd 对记忆的反应不佳。 它所做的计算比缓存算法要便宜得多。 对于大量的数据,它很快就会终止。 一段时间后,缓存变得非常大。 该算法可能是尽可能快的。
Extending the idea, it's also possible to memoize functions with two input parameters:
Notice that parameter order matters in the caching algorithm, so if parameter order doesn't matter in the functions to be memoized the odds of getting a cache hit would be increased by sorting the parameters before checking the cache.
But it's important to note that some functions can't be profitably memoized. I wrote memoize2 to see if the recursive Euclidean algorithm for finding the greatest common divisor could be sped up.
As it turns out, gcd doesn't respond well to memoization. The calculation it does is far less expensive than the caching algorithm. Ever for large numbers, it terminates fairly quickly. After a while, the cache grows very large. This algorithm is probably as fast as it can be.
递归不是必需的。 第 n 个三角形数是 n(n-1)/2,所以...
Recursion isn't necessary. The nth triangle number is n(n-1)/2, so...
请不要重复这个。 可以使用 x*(x+1)/2 公式,也可以简单地迭代这些值并随时记住。
Please don't recurse this. Either use the x*(x+1)/2 formula or simply iterate the values and memoize as you go.