Haskell 可变映射/树

发布于 2024-08-21 06:33:19 字数 161 浏览 12 评论 0原文

我正在 Haskell 中寻找可变（平衡）树/映射/哈希表或如何在函数内模拟它的方法。即，当我多次调用同一函数时，结构会被保留。到目前为止，我已经尝试过 Data.HashTable（还可以，但有点慢）并尝试过 Data.Array.Judy，但我无法使其与 GHC 6.10.4 一起使用。还有其他选择吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

童话 2024-08-28 06:33:19

如果你想要可变状态，你可以拥有它。只需继续传递更新后的地图，或将其保留在状态单子中（结果是相同的事情）。

import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
    mc <- newSTRef Map.empty
    return $ \k -> do
        c <- readSTRef mc
        case Map.lookup k c of
            Just a -> return a
            Nothing -> do a <- f k
                          writeSTRef mc (Map.insert k a c) >> return a

你可以像这样使用它。（实际上，您可能还想添加一种从缓存中清除项目的方法。）

import Control.Monad
main :: IO ()
main = do
    fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
        if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
    mapM_ (print <=< stToIO . fib) [1..10000]

您可以不安全摆脱线程状态的要求，需要您自担风险通过一切需要它的事情。

import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
    f' <- stToIO $ memoize $ return . f
    return $ unsafePerformIO . stToIO . f'

fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)

main :: IO ()
main = mapM_ (print . fib) [1..1000]

If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).

import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
    mc <- newSTRef Map.empty
    return $ \k -> do
        c <- readSTRef mc
        case Map.lookup k c of
            Just a -> return a
            Nothing -> do a <- f k
                          writeSTRef mc (Map.insert k a c) >> return a

You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)

import Control.Monad
main :: IO ()
main = do
    fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
        if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
    mapM_ (print <=< stToIO . fib) [1..10000]

At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.

import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
    f' <- stToIO $ memoize $ return . f
    return $ unsafePerformIO . stToIO . f'

fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)

main :: IO ()
main = mapM_ (print . fib) [1..1000]

回复收藏 0 原文

心欲静而疯不止 2024-08-28 06:33:19

基于@Ramsey 的答案，我还建议您重新构思您的函数来获取地图并返回修改后的地图。然后使用良好的数据进行编码.Map，修改起来非常高效。这是一个模式：

import qualified Data.Map as Map

-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here

-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
    where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
    -- this starts with an initial map, uses successive versions of the map
    -- on each iteration, and returns a tuple of the results, and the final map

-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
    -- same as above, but starts with an empty map, and ignores the final map

很容易抽象这个模式并使 mapFuncWithMap 对以这种方式使用映射的函数具有通用性。

Building on @Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:

import qualified Data.Map as Map

-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here

-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
    where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
    -- this starts with an initial map, uses successive versions of the map
    -- on each iteration, and returns a tuple of the results, and the final map

-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
    -- same as above, but starts with an empty map, and ignores the final map

It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.

回复收藏 0 原文

那些过往 2024-08-28 06:33:19

尽管您要求可变类型，但我建议您使用不可变数据结构，并将连续版本作为参数传递给函数。

关于使用哪种数据结构，

有一个肯特郡红黑树的实现
如果你有整数键，Data.IntMap 效率极高。
如果您有字符串键，则 bytestring-trie 包来自Hackage 看起来非常好。

问题是我无法使用（或者我不知道如何）使用非可变类型。

如果幸运的话，您可以将表数据结构作为额外参数传递给每个需要它的函数。但是，如果您的表需要广泛分布，您可能希望使用 state monad 其中状态是表的内容。

如果你想记忆，你可以尝试 Conal Elliott 博客中的一些惰性记忆技巧，但是一旦你超越了整数参数，惰性记忆就变得非常模糊——我不建议你作为初学者尝试。也许您可以发布有关您试图解决的更广泛问题的问题？通常，对于 Haskell 和可变性，问题是如何将突变或更新包含在某种范围内。

在没有任何全局可变变量的情况下学习编程并不是那么容易。

回复收藏 0 原文

凤舞天涯 2024-08-28 06:33:19

如果我正确地阅读了您的评论，那么您将拥有一个可能需要计算约 500k 总值的结构。计算成本很高，因此您希望它们只完成一次，并且在后续访问中，您只需要值而不需要重新计算。

在这种情况下，请利用 Haskell 的惰性来发挥你的优势！ ~500k 并不是那么大：只需构建所有答案的地图，然后根据需要获取即可。第一次获取将强制计算，后续获取相同答案将重用相同结果，如果您从未获取特定计算 - 它永远不会发生！

您可以在文件 PointCloud.hs。该文件使用 Debug.Trace 来记录计算实际完成的时间：

> ghc --make PointCloud.hs 
[1 of 1] Compiling Main             ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...

> ./PointCloud 
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0

If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.

In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!

You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:

> ghc --make PointCloud.hs 
[1 of 1] Compiling Main             ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...

> ./PointCloud 
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0

回复收藏 0 原文