Haskell 中的短暂记忆?

发布于 2025-01-08 13:28:47 字数 629 浏览 0 评论 0原文

在面向对象的语言中,当我需要在已知的生命周期内缓存/记忆函数的结果时,我通常会遵循以下模式:

  1. 创建一个新类
  2. 向该类添加数据成员和每个函数结果的方法我想在缓存中
  3. 实现方法,首先检查结果是否已存储在数据成员中。如果是,则返回该值;否则调用该函数(使用适当的参数)并将返回的结果存储在数据成员中。
  4. 此类的对象将使用各种函数调用所需的值进行初始化。

这种基于对象的方法与此处描述的基于函数的记忆模式非常相似:http://www.bardiak.com/2012/01/javascript-memoization-pattern.html

这种方法的主要好处是结果仅在缓存对象的生命周期内保留。一个常见的用例是处理工作项列表。对于每个工作项,创建该项的缓存对象,使用该缓存对象处理该工作项,然后在继续下一个工作项之前丢弃该工作项和缓存对象。

在 Haskell 中实现短期记忆的好方法是什么?答案是否取决于要缓存的函数是纯函数还是涉及 IO?

重申一下 - 很高兴看到涉及 IO 的功能的解决方案。

In an object-oriented language when I need to cache/memoize the results of a function for a known life-time I'll generally follow this pattern:

  1. Create a new class
  2. Add to the class a data member and a method for each function result I want to cache
  3. Implement the method to first check to see if the result has been stored in the data member. If so, return that value; else call the function (with the appropriate arguments) and store the returned result in the data member.
  4. Objects of this class will be initialized with values that are needed for the various function calls.

This object-based approach is very similar to the function-based memoization pattern described here: http://www.bardiak.com/2012/01/javascript-memoization-pattern.html

The main benefit of this approach is that the results are kept around only for the life time of the cache object. A common use case is in the processing of a list of work items. For each work item one creates the cache object for that item, processes the work item with that cache object then discards the work item and cache object before proceeding to the next work item.

What are good ways to implement short-lived memoization in Haskell? And does the answer depend on if the functions to be cached are pure or involve IO?

Just to reiterate - it would be nice to see solutions for functions which involve IO.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

旧街凉风 2025-01-15 13:28:47

让我们使用 Luke Palmer 的记忆库:Data.MemoCombinators

import qualified Data.MemoCombinators as Memo
import Data.Function (fix) -- we'll need this too

我将定义与他的库的做法略有不同的东西,但它基本上是相同的(而且是兼容的)。一个“可记忆”的事物将其自身作为输入,并产生“真实”的事物。

type Memoizable a = a -> a

“memoizer”接受一个函数并生成它的记忆版本。

type Memoizer a b = (a -> b) -> a -> b

让我们编写一个小函数将这两件事放在一起。给定一个 Memoizable 函数和一个 Memoizer,我们需要生成的记忆函数。

runMemo :: Memoizer a b -> Memoizable (a -> b) -> a -> b
runMemo memo f = fix (f . memo)

这是使用定点组合器(fix)的一个小魔法。没关系;如果你有兴趣可以google一下。

因此,让我们编写一个经典 fib 示例的可记忆化版本:

fib :: Memoizable (Integer -> Integer)
fib self = go
  where go 0 = 1
        go 1 = 1
        go n = self (n-1) + self (n-2)

使用 self 约定使代码变得简单。请记住,self 是我们期望的该函数的记忆版本,因此递归调用应该在 self 上。现在启动 ghci。

ghci> let fib' = runMemo Memo.integral fib
ghci> fib' 10000
WALL OF NUMBERS CRANKED OUT RIDICULOUSLY FAST

现在,runMemo 的一个很酷的事情是,您可以创建同一函数的多个新记忆版本,并且它们不会共享内存库。这意味着我可以编写一个在本地创建和使用 fib' 的函数,但是一旦 fib' 超出范围(或更早,取决于编译器),它可以被垃圾收集它不必在顶层被记住。这可能会或可能不会与依赖于 unsafePerformIO 的记忆技术很好地配合。 Data.MemoCombinators 使用纯粹的惰性 Trie,它与 runMemo 完美契合。您可以根据需要简单地创建记忆函数,而不是创建本质上成为记忆管理器的对象。问题是,如果您的函数是递归的,则必须将其编写为可记忆化的。好消息是您可以插入任何您想要的 Memoizer。你甚至可以使用:

noMemo :: Memoizer a b
noMemo f = f

ghci> let fib' = runMemo noMemo fib
ghci> fib' 30 -- wait a while; it's computing stupidly
1346269

Let's use Luke Palmer's memoization library: Data.MemoCombinators

import qualified Data.MemoCombinators as Memo
import Data.Function (fix) -- we'll need this too

I'm going to define things slightly different from how his library does, but it's basically the same (and furthermore, compatible). A "memoizable" thing takes itself as input, and produces the "real" thing.

type Memoizable a = a -> a

A "memoizer" takes a function and produces the memoized version of it.

type Memoizer a b = (a -> b) -> a -> b

Let's write a little function to put these two things together. Given a Memoizable function and a Memoizer, we want the resultant memoized function.

runMemo :: Memoizer a b -> Memoizable (a -> b) -> a -> b
runMemo memo f = fix (f . memo)

This is a little magic using the fixpoint combinator (fix). Never mind that; you can google it if you are interested.

So let's write a Memoizable version of the classic fib example:

fib :: Memoizable (Integer -> Integer)
fib self = go
  where go 0 = 1
        go 1 = 1
        go n = self (n-1) + self (n-2)

Using a self convention makes the code straightforward. Remember, self is what we expect to be the memoized version of this very function, so recursive calls should be on self. Now fire up ghci.

ghci> let fib' = runMemo Memo.integral fib
ghci> fib' 10000
WALL OF NUMBERS CRANKED OUT RIDICULOUSLY FAST

Now, the cool thing about runMemo is you can create more than one freshly memoized version of the same function, and they will not share memory banks. That means that I can write a function that locally creates and uses fib', but then as soon as fib' falls out of scope (or earlier, depending on the intelligence of the compiler), it can be garbage collected. It doesn't have to be memoized at the top level. This may or may not play nicely with memoization techniques that rely on unsafePerformIO. Data.MemoCombinators uses a pure, lazy Trie, which fits perfectly with runMemo. Rather than creating an object which essentially becomes a memoization manager, you can simply create memoized functions on demand. The catch is that if your function is recursive, it must be written as Memoizable. The good news is you can plug in any Memoizer that you wish. You could even use:

noMemo :: Memoizer a b
noMemo f = f

ghci> let fib' = runMemo noMemo fib
ghci> fib' 30 -- wait a while; it's computing stupidly
1346269
岁吢 2025-01-15 13:28:47

从某种程度上来说,Lazy-Haskell 编程是一种极端的记忆范式。此外,无论你在命令式语言中做什么,在 Haskell 中都是可能的,使用 IO monad、ST monad、monad 转换器、箭头,或者你能想到的任何东西。

唯一的问题是这些抽象设备比您提到的命令式等价物复杂得多,并且它们需要相当深入的思维重新布线。

Lazy-Haskell programming is, in a way, the memoization paradigm taken to a extreme. Also, whatever you do in an imperative language is possible in Haskell, using either IO monad, the ST monad, monad transformers, arrows, or you name what.

The only problem is that these abstraction devices are much more complicated than the imperative equivalent that you mentioned, and they need a pretty deep mind-rewiring.

不醒的梦 2025-01-15 13:28:47

我相信上述答案都比必要的更复杂,尽管它们可能比我将要描述的更容易移植。

据我了解,ghc 中有一条规则,即每个值在输入其封闭的 lambda 表达式时只计算一次。因此,您可以如下创建您的短期记忆对象。

import qualified Data.Vector as V
indexerVector :: (t -> Int) -> V.Vector t -> Int -> [t]
indexerVector idx vec = \e -> tbl ! e
  where m = maximum $ map idx $ V.toList vec
        tbl = V.accumulate (flip (:)) (V.replicate m []) 
                 (V.map (\v -> (idx v, v)) vec)

这是做什么的?它根据第一个参数 idx 计算的整数,对作为第二个参数 vec 传递的 Data.Vector t 中的所有元素进行分组,保留它们的值分组为 Data.Vector [t]。它返回一个 Int 类型的函数 -> [t] 通过此预先计算的索引值查找此分组。

我们的编译器 ghc 已承诺,当我们调用 indexerVector 时,tbl 只会被 thunked 一次。因此,我们可以指定 lambda 表达式 \e ->长话短说! eindexVector 返回到另一个值,我们可以重复使用该值,而不必担心 tbl 被重新计算。您可以通过在 tbl 上插入 trace 来验证这一点。

简而言之,您的缓存对象就是这个 lambda 表达式。

我发现几乎任何可以用短期对象完成的事情都可以通过返回这样的 lambda 表达式来更好地完成。

I believe the above answers are both more complex than necessary, although they might be more portable than what I'm about to describe.

As I understand it, there is a rule in ghc that each value is computed exactly once when it's enclosing lambda expression is entered. You may thus create exactly your short lived memoization object as follows.

import qualified Data.Vector as V
indexerVector :: (t -> Int) -> V.Vector t -> Int -> [t]
indexerVector idx vec = \e -> tbl ! e
  where m = maximum $ map idx $ V.toList vec
        tbl = V.accumulate (flip (:)) (V.replicate m []) 
                 (V.map (\v -> (idx v, v)) vec)

What does this do? It groups all the elements in the Data.Vector t passed as it's second argument vec according to integer computed by it's first argument idx, retaining their grouping as a Data.Vector [t]. It returns a function of type Int -> [t] which looks up this grouping by this pre-computed index value.

Our compiler ghc has promised that tbl shall only be thunked once when we invoke indexerVector. We may therefore assign the lambda expression \e -> tbl ! e returned by indexVector to another value, which we may use repeatedly without fear that tbl ever gets recomputed. You may verify this by inserting a trace on tbl.

In short, your caching object is exactly this lambda expression.

I've found that almost anything you can accomplish with a short term object can be better accomplished by returning a lambda expression like this.

独享拥抱 2025-01-15 13:28:47

你也可以在 haskell 中使用相同的模式。惰性求值将负责检查值是否已经被求值。它已经被多次提及,但代码示例可能很有用。在下面的示例中,memoedValue 将仅在需要时计算一次。

data Memoed = Memoed
  { value       :: Int
  , memoedValue :: Int
  }

memo :: Int -> Memoed
memo i = Memoed
  { value       = i
  , memoedValue = expensiveComputation i
  }

更好的是,您可以记住依赖于其他已记忆值的值。您应该避免依赖循环。它们可以导致不终止

data Memoed = Memoed
  { value        :: Int
  , memoedValue1 :: Int
  , memoedValue2 :: Int
  }

memo :: Int -> Memoed
memo i = r
  where
  r = Memoed
    { value        = i
    , memoedValue1 = expensiveComputation i
    , memoedValue2 = anotherComputation (memoedValue1 r)
    }

You can use very same pattern in haskell too. Lazy evaluation will take care of checking whether value is evaluated already. It has been mentioned mupltiple times already but code example could be useful. In example below memoedValue will calculated only once when it is demanded.

data Memoed = Memoed
  { value       :: Int
  , memoedValue :: Int
  }

memo :: Int -> Memoed
memo i = Memoed
  { value       = i
  , memoedValue = expensiveComputation i
  }

Even better you can memoize values which depend on other memoized values. You shoud avoid dependecy loops. They can lead to nontermination

data Memoed = Memoed
  { value        :: Int
  , memoedValue1 :: Int
  , memoedValue2 :: Int
  }

memo :: Int -> Memoed
memo i = r
  where
  r = Memoed
    { value        = i
    , memoedValue1 = expensiveComputation i
    , memoedValue2 = anotherComputation (memoedValue1 r)
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文