如何释放Haskell中特定数据结构的内存？

发布于 2025-01-16 02:44:47 字数 1287 浏览 0 评论 0原文

假设我有几个非常大的向量。它们存储在磁盘上。我需要通过读取每个相应的文件来单独访问它们，这会将它们放入内存中。我将对单个向量执行某些功能，然后转到我需要访问的下一个向量。每次我需要访问不同的向量时，我需要能够指示内存中的每个向量进行垃圾收集。我不确定如果我的程序中声明我必须稍后通过引用读取向量的相同函数名称来再次访问同一向量，则 performMajorGC 是否会确保向量被垃圾收集从磁盘输入。在这种情况下，我会再次将其读入内存，使用它，然后进行垃圾收集。我如何确保它是车库集合，同时对从同一文件读取的向量使用相同的函数名称？

希望有任何建议，谢谢

回复 Daniel Wagner：

    myvec x :: Int -> IO (Vector (Vector ByteString))
    myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
                              guard (isRight (Data.Csv.decode NoHeader y)) 
                              return y
                 yy <- ioy 
                 return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])

    myvecvec :: Vector (IO (Vector (Vector ByteString)))
    myvecvec = generate 100 (\x -> myvec x)

    somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
    somefunc1 iovv = do vv <- iovv
                        somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()

-- 对于 somefunc2 和 3 也是如此

    oponvec :: IO ()
    oponvec = do somefunc1 (myvecvec ! 0)
                 performGC
                 somefunc2 (myvecvec ! 1)
                 performGC
                 somefunc3 (myvecvec ! 0)

原文

Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?

Would appreciate any advice thanks

In response to Daniel Wagner:

    myvec x :: Int -> IO (Vector (Vector ByteString))
    myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
                              guard (isRight (Data.Csv.decode NoHeader y)) 
                              return y
                 yy <- ioy 
                 return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])

    myvecvec :: Vector (IO (Vector (Vector ByteString)))
    myvecvec = generate 100 (\x -> myvec x)

    somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
    somefunc1 iovv = do vv <- iovv
                        somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()

-- same thing for somefunc2 and 3

    oponvec :: IO ()
    oponvec = do somefunc1 (myvecvec ! 0)
                 performGC
                 somefunc2 (myvecvec ! 1)
                 performGC
                 somefunc3 (myvecvec ! 0)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨与橙与城 2025-01-23 02:44:47

您可以使用弱指针来测试这一点，如下所示：

import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem

main :: IO ()
main = do
  let xs = V.fromList [1..1000000:: Int]
  wkp <- mkWeakPtr xs Nothing
  performGC
  xs' <- deRefWeak wkp
  print xs'

在我的系统上，这会打印 Nothing ，这意味着向量已被释放。不过，我不知道GHC是否保证会发生这种情况。

这是一个检查@amalloy建议的程序：

import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word

{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB

main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))

这在我的机器上恰好使用了5GB，这表明从来没有同时分配两个大向量。

You can test this by using a weak pointer as follows:

import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem

main :: IO ()
main = do
  let xs = V.fromList [1..1000000:: Int]
  wkp <- mkWeakPtr xs Nothing
  performGC
  xs' <- deRefWeak wkp
  print xs'

On my system this prints Nothing which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.

Here's a program which checks @amalloy's suggestion:

import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word

{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB

main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))

This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.

回复收藏 0 原文