Haskell 中的短暂记忆?
在面向对象的语言中,当我需要在已知的生命周期内缓存/记忆函数的结果时,我通常会遵循以下模式:
- 创建一个新类
- 向该类添加数据成员和每个函数结果的方法我想在缓存中
- 实现方法,首先检查结果是否已存储在数据成员中。如果是,则返回该值;否则调用该函数(使用适当的参数)并将返回的结果存储在数据成员中。
- 此类的对象将使用各种函数调用所需的值进行初始化。
这种基于对象的方法与此处描述的基于函数的记忆模式非常相似:http://www.bardiak.com/2012/01/javascript-memoization-pattern.html
这种方法的主要好处是结果仅在缓存对象的生命周期内保留。一个常见的用例是处理工作项列表。对于每个工作项,创建该项的缓存对象,使用该缓存对象处理该工作项,然后在继续下一个工作项之前丢弃该工作项和缓存对象。
在 Haskell 中实现短期记忆的好方法是什么?答案是否取决于要缓存的函数是纯函数还是涉及 IO?
重申一下 - 很高兴看到涉及 IO 的功能的解决方案。
In an object-oriented language when I need to cache/memoize the results of a function for a known life-time I'll generally follow this pattern:
- Create a new class
- Add to the class a data member and a method for each function result I want to cache
- Implement the method to first check to see if the result has been stored in the data member. If so, return that value; else call the function (with the appropriate arguments) and store the returned result in the data member.
- Objects of this class will be initialized with values that are needed for the various function calls.
This object-based approach is very similar to the function-based memoization pattern described here: http://www.bardiak.com/2012/01/javascript-memoization-pattern.html
The main benefit of this approach is that the results are kept around only for the life time of the cache object. A common use case is in the processing of a list of work items. For each work item one creates the cache object for that item, processes the work item with that cache object then discards the work item and cache object before proceeding to the next work item.
What are good ways to implement short-lived memoization in Haskell? And does the answer depend on if the functions to be cached are pure or involve IO?
Just to reiterate - it would be nice to see solutions for functions which involve IO.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
让我们使用 Luke Palmer 的记忆库:Data.MemoCombinators
我将定义与他的库的做法略有不同的东西,但它基本上是相同的(而且是兼容的)。一个“可记忆”的事物将其自身作为输入,并产生“真实”的事物。
“memoizer”接受一个函数并生成它的记忆版本。
让我们编写一个小函数将这两件事放在一起。给定一个 Memoizable 函数和一个 Memoizer,我们需要生成的记忆函数。
这是使用定点组合器(
fix
)的一个小魔法。没关系;如果你有兴趣可以google一下。因此,让我们编写一个经典 fib 示例的可记忆化版本:
使用 self 约定使代码变得简单。请记住,
self
是我们期望的该函数的记忆版本,因此递归调用应该在self
上。现在启动 ghci。现在,
runMemo
的一个很酷的事情是,您可以创建同一函数的多个新记忆版本,并且它们不会共享内存库。这意味着我可以编写一个在本地创建和使用fib'
的函数,但是一旦fib'
超出范围(或更早,取决于编译器),它可以被垃圾收集。 它不必在顶层被记住。这可能会或可能不会与依赖于 unsafePerformIO 的记忆技术很好地配合。Data.MemoCombinators
使用纯粹的惰性 Trie,它与runMemo
完美契合。您可以根据需要简单地创建记忆函数,而不是创建本质上成为记忆管理器的对象。问题是,如果您的函数是递归的,则必须将其编写为可记忆化的。好消息是您可以插入任何您想要的Memoizer
。你甚至可以使用:Let's use Luke Palmer's memoization library: Data.MemoCombinators
I'm going to define things slightly different from how his library does, but it's basically the same (and furthermore, compatible). A "memoizable" thing takes itself as input, and produces the "real" thing.
A "memoizer" takes a function and produces the memoized version of it.
Let's write a little function to put these two things together. Given a
Memoizable
function and aMemoizer
, we want the resultant memoized function.This is a little magic using the fixpoint combinator (
fix
). Never mind that; you can google it if you are interested.So let's write a
Memoizable
version of the classic fib example:Using a
self
convention makes the code straightforward. Remember,self
is what we expect to be the memoized version of this very function, so recursive calls should be onself
. Now fire up ghci.Now, the cool thing about
runMemo
is you can create more than one freshly memoized version of the same function, and they will not share memory banks. That means that I can write a function that locally creates and usesfib'
, but then as soon asfib'
falls out of scope (or earlier, depending on the intelligence of the compiler), it can be garbage collected. It doesn't have to be memoized at the top level. This may or may not play nicely with memoization techniques that rely onunsafePerformIO
.Data.MemoCombinators
uses a pure, lazy Trie, which fits perfectly withrunMemo
. Rather than creating an object which essentially becomes a memoization manager, you can simply create memoized functions on demand. The catch is that if your function is recursive, it must be written asMemoizable
. The good news is you can plug in anyMemoizer
that you wish. You could even use:从某种程度上来说,Lazy-Haskell 编程是一种极端的记忆范式。此外,无论你在命令式语言中做什么,在 Haskell 中都是可能的,使用 IO monad、ST monad、monad 转换器、箭头,或者你能想到的任何东西。
唯一的问题是这些抽象设备比您提到的命令式等价物复杂得多,并且它们需要相当深入的思维重新布线。
Lazy-Haskell programming is, in a way, the memoization paradigm taken to a extreme. Also, whatever you do in an imperative language is possible in Haskell, using either IO monad, the ST monad, monad transformers, arrows, or you name what.
The only problem is that these abstraction devices are much more complicated than the imperative equivalent that you mentioned, and they need a pretty deep mind-rewiring.
我相信上述答案都比必要的更复杂,尽管它们可能比我将要描述的更容易移植。
据我了解,ghc 中有一条规则,即每个值在输入其封闭的 lambda 表达式时只计算一次。因此,您可以如下创建您的短期记忆对象。
这是做什么的?它根据第一个参数
idx
计算的整数,对作为第二个参数vec
传递的Data.Vector t
中的所有元素进行分组,保留它们的值分组为Data.Vector [t]
。它返回一个 Int 类型的函数 -> [t] 通过此预先计算的索引值查找此分组。我们的编译器 ghc 已承诺,当我们调用
indexerVector
时,tbl
只会被 thunked 一次。因此,我们可以指定 lambda 表达式\e ->长话短说! e
由indexVector
返回到另一个值,我们可以重复使用该值,而不必担心tbl
被重新计算。您可以通过在tbl
上插入trace
来验证这一点。简而言之,您的缓存对象就是这个 lambda 表达式。
我发现几乎任何可以用短期对象完成的事情都可以通过返回这样的 lambda 表达式来更好地完成。
I believe the above answers are both more complex than necessary, although they might be more portable than what I'm about to describe.
As I understand it, there is a rule in
ghc
that each value is computed exactly once when it's enclosing lambda expression is entered. You may thus create exactly your short lived memoization object as follows.What does this do? It groups all the elements in the
Data.Vector t
passed as it's second argumentvec
according to integer computed by it's first argumentidx
, retaining their grouping as aData.Vector [t]
. It returns a function of typeInt -> [t]
which looks up this grouping by this pre-computed index value.Our compiler
ghc
has promised thattbl
shall only be thunked once when we invokeindexerVector
. We may therefore assign the lambda expression\e -> tbl ! e
returned byindexVector
to another value, which we may use repeatedly without fear thattbl
ever gets recomputed. You may verify this by inserting atrace
ontbl
.In short, your caching object is exactly this lambda expression.
I've found that almost anything you can accomplish with a short term object can be better accomplished by returning a lambda expression like this.
你也可以在 haskell 中使用相同的模式。惰性求值将负责检查值是否已经被求值。它已经被多次提及,但代码示例可能很有用。在下面的示例中,
memoedValue
将仅在需要时计算一次。更好的是,您可以记住依赖于其他已记忆值的值。您应该避免依赖循环。它们可以导致不终止
You can use very same pattern in haskell too. Lazy evaluation will take care of checking whether value is evaluated already. It has been mentioned mupltiple times already but code example could be useful. In example below
memoedValue
will calculated only once when it is demanded.Even better you can memoize values which depend on other memoized values. You shoud avoid dependecy loops. They can lead to nontermination