Haskell 缓存函数结果

发布于 2024-08-20 04:55:45 字数 572 浏览 9 评论 0原文

我有一个函数,它接受一个参数并产生一个结果。不幸的是,该函数需要很长时间才能产生结果。使用相同的输入经常调用该函数,这就是为什么如果我可以缓存结果会很方便。就像

let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)

我正在研究 Data.Array 一样,虽然该数组是惰性的,但我需要用一个对列表(使用 listArray)来初始化它 - 这是不切实际的。如果“键”是“Double”类型,我根本无法初始化它,即使理论上我可以为每个可能的输入分配一个整数,我也有数万个可能的输入,但实际上我只使用了少数。我需要使用函数而不是列表来初始化数组(或者最好是哈希表,因为只会使用少数结果)。

更新:我正在阅读记忆文章,据我了解,MemoTrie 可以按照我想要的方式工作。或许。有人可以尝试生成“cachedFunction”吗?更适合需要 2 个 Double 参数的慢速函数?或者,另一种方法是,在 ~ [0.1 亿] 的域中采用一个 Int 参数,而不会耗尽所有内存?

I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like

let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)

I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.

Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

找回味觉 2024-08-27 04:55:45

嗯,有Data.HashTable。不过,哈希表往往不能很好地处理不可变数据和引用透明性,因此我认为它没有太多用途。

对于少量值,将它们存储在搜索树(例如 Data.Map)中可能足够快。如果您可以忍受对 Double 进行一些修改,更可靠的解决方案是使用类似 trie 的结构,例如 Data.IntMap;它们的查找时间主要与密钥长度成正比,并且集合大小大致恒定。如果 Int 限制太多,您可以在 Hackage 上深入研究,找到使用密钥类型更灵活的 trie 库。

至于如何缓存结果,我认为你想要的通常称为“memoization”。如果您想按需计算和记忆结果,该技术的要点是定义一个包含所有可能结果的索引数据结构,这样当您请求特定结果时,它只强制获得您想要的答案所需的计算。常见的示例通常涉及对列表进行索引,但相同的原则应该适用于任何非严格的数据结构。根据经验,非函数值(包括无限递归数据结构)通常会被运行时缓存,但函数结果不会被缓存,因此技巧是将所有计算包装在一个顶级定义中,该定义不会取决于任何参数。

编辑: MemoTrie 示例啊嘿!

这是一个快速而肮脏的概念证明;可能存在更好的方法。

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)

mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode

unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral

instance HasTrie Double where
    data Double :->: a = DoubleTrie ([Int] :->: a)
    trie f = DoubleTrie $ trie $ f . unmangle
    untrie (DoubleTrie t) = untrie t . mangle

slow x 
    | x < 1 = 1
    | otherwise = slow (x / 2) + slow (x / 3)

memoSlow :: Double -> Integer
memoSlow = memo slow

请注意 MemoTrie 包使用的 GHC 扩展;希望这不是问题。将其加载到 GHCi 中,然后尝试使用 (10^6) 或 (10^7) 之类的内容调用 slowmemoSlow 来查看它的实际情况。

将其推广到采用多个参数或其他参数的函数应该相当简单。有关使用 MemoTrie 的更多详细信息,您可以找到 这篇博客文章它的作者很有帮助。

Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.

For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.

As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.

Edit: MemoTrie example ahoy!

This is a quick and dirty proof of concept; better approaches may exist.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)

mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode

unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral

instance HasTrie Double where
    data Double :->: a = DoubleTrie ([Int] :->: a)
    trie f = DoubleTrie $ trie $ f . unmangle
    untrie (DoubleTrie t) = untrie t . mangle

slow x 
    | x < 1 = 1
    | otherwise = slow (x / 2) + slow (x / 3)

memoSlow :: Double -> Integer
memoSlow = memo slow

Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.

Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.

何以畏孤独 2024-08-27 04:55:45

GHC 的运行时系统中有许多工具明确支持记忆化。

不幸的是,记忆化并不是真正的一刀切,因此我们需要支持几种不同的方法来满足不同的用户需求。

您可能会发现 1999 年的原始文章很有用,因为它包含多个实现作为示例:

扩展存储管理器:Haskell 中的弱指针和稳定名称 作者:Simon Peyton Jones、Simon Marlow 和 Conal Elliott

There are a number of tools in GHC's runtime system explicitly to support memoization.

Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.

You may find the original 1999 writeup useful as it includes several implementations as examples:

Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott

四叶草在未来唯美盛开 2024-08-27 04:55:45

我将添加我自己的解决方案,这似乎也很慢。第一个参数是一个返回 Int32 的函数 - 这是参数的唯一标识符。如果您想通过不同的方式(例如通过“id”)唯一地标识它,则必须将 H.new 中的第二个参数更改为不同的哈希函数。我将尝试找出如何使用 Data.Map 并测试是否能获得更快的结果。

import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe

cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
    where 
        createfunc = do
            storage <- H.new (==) id
            return (doit storage)

        doit storage = unsafePerformIO . comp
            where 
                comp x = do
                    look <- H.lookup storage (ident x)

                    case look of
                        Just res -> return res
                        Nothing -> do
                            result <- return (f x)
                            H.insert storage (ident x) result
                            return result

I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.

import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe

cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
    where 
        createfunc = do
            storage <- H.new (==) id
            return (doit storage)

        doit storage = unsafePerformIO . comp
            where 
                comp x = do
                    look <- H.lookup storage (ident x)

                    case look of
                        Just res -> return res
                        Nothing -> do
                            result <- return (f x)
                            H.insert storage (ident x) result
                            return result
<逆流佳人身旁 2024-08-27 04:55:45

您可以将慢速函数编写为高阶函数,返回函数本身。因此,您可以在慢速函数内进行所有预处理,并在返回的(希望是快速的)函数中进行每次计算中不同的部分。一个例子可能是这样的:
(SML代码,不过思路应该很清晰)

fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91

You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this:
(SML code, but the idea should be clear)

fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91
妄想挽回 2024-08-27 04:55:45

我有数万个可能的输入,但实际上我只使用了少数。我需要使用函数而不是列表来初始化数组......

我会选择 listArray (start, end) (map func [start..end])

  • func 上面并没有真正被调用。 Haskell 很懒,会创建 thunk,当实际需要该值时才会对其进行求值。
  • 使用普通数组时,您始终需要初始化其值。所以无论如何,创建这些 thunk 所需的工作是必要的。
  • 几万远不是很多。如果你有数万亿,那么我建议使用哈希表 yada yada

I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.

I'd go with listArray (start, end) (map func [start..end])

  • func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
  • When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
  • Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada
鸵鸟症 2024-08-27 04:55:45

我具体不了解 haskell,但是如何将现有答案保留在某些散列数据结构(可能称为字典或散列图)中?您可以将慢速函数包装在另一个函数中,该函数首先检查映射,并且仅在未找到答案时才调用慢速函数。

您可以通过将地图的大小限制为一定大小来使其变得更奇特,当达到该大小时,丢弃最近最少使用的条目。为此,您还需要保留键到时间戳映射的映射。

I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.

You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文